Blog

Page 90

12 articles

The Model Bill Is 30% of Your Inference Cost
Token spend is one line in a six-line budget. A real decomposition of retrieval, observability, retries, and human review shows why model-swap savings usually lie.
llm-costfinops
Apr 228 min
The Model-of-the-Week Roadmap: When Vendor Promises Become Committed Dependencies
Treating unreleased vendor model capabilities as committed roadmap dependencies turns twelve-month plans into thirty-month rebuilds. A field guide to slip, gate, and re-scope risk — and the discipline of planning against available-today models.
insiderai-strategy
Apr 229 min
Multi-Model Reliability Is Not 2x: The Non-Linear Cost of a Second LLM Provider
Teams adopt a second LLM provider expecting 2x cost for near-perfect uptime. In production the operational math is 4–5x, correlated failures attenuate the uptime gain, and a well-designed degraded mode on one provider usually wins.
insiderllm
Apr 2213 min
No Results Is Not Absence: Why Agents Treat Retrieval Failure as Proof
Agents that say 'no results' are rarely making a claim about the world. They are narrating an empty array as if it were proof — and that is how quiet production incidents get manufactured.
ai-agentsrag
Apr 2210 min
Your OAuth Tokens Expire Mid-Task: The Silent Failure Mode of Long-Running Agents
OAuth was designed for short requests; agent loops outlive their tokens. Walk through the failure modes, refresh patterns, and credential-lifecycle architecture that hold up at agent timescales.
oauthai-agents
Apr 2211 min
The Orphan Adapter Problem: When Your Fine-Tune Outlives Its Base Model
Fine-tuned adapters pinned to deprecated base models turn into production zombies — load-bearing and unreproducible. A durable adapter lifecycle needs base-model-synced retraining cadence, behavioral fingerprint tests, and institutional memory that survives team changes.
insiderfine-tuning
Apr 2212 min
The Output Commitment Problem: Why Streaming Self-Correction Destroys User Trust More Than the Original Error
Mid-stream revisions read as incompetence even when the final answer is correct. The fix is a plan-first-then-commit protocol, a clear taxonomy of refinement surfaces, and deliberate choices about when to hide thinking.
insiderai-ux
Apr 2210 min
Pattern-Matching Failures: When Your LLM Solves the Wrong Problem Fluently
Fluent, on-topic LLM answers that solve the wrong problem are the hardest bug class in production. A practical playbook for detecting surface-feature overfitting and designing prompts that expose it.
insiderllm
Apr 2211 min
Plan-and-Execute Is Marketing, Not Contract: Plan Adherence as a First-Class SLI
Plan-and-execute agents emit plans that look like contracts but behave like forecasts. Treat plan adherence as an SLI with measurement, enforcement, and bounded re-planning budgets — not a quality nice-to-have you grade once a quarter.
insiderai-agents
Apr 229 min
Your Planner Knows About Tools Your User Can't Call
Scoping the tools list at execution time is too late. If the planner sees the full catalog, its refusals, clarifying questions, and reasoning trace leak capability existence to users who aren't authorized to know.
ai-agentssecurity
Apr 229 min
Popularity Bias in Vector Retrieval: Why the Same Five Chunks Dominate Every Query
Why a few chunks dominate every RAG query — how high-dimensional hubness and ANN graph structure silently collapse retrieval diversity, and the diagnostics plus mitigations that keep the long tail alive.
ragvector-search
Apr 2210 min
The Prompt Ownership Problem: When Conway's Law Comes for Your Prompts
Prompts live in four teams at once — authors, evaluators, deployers, and support. When no single role owns the whole loop, Conway's law guarantees silent quality leaks. The RACI gaps, shared-library traps, and steward role that actually keep behavior coherent.
prompt-engineeringai-governance
Apr 2211 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 90

The Model Bill Is 30% of Your Inference Cost

The Model-of-the-Week Roadmap: When Vendor Promises Become Committed Dependencies

Multi-Model Reliability Is Not 2x: The Non-Linear Cost of a Second LLM Provider

No Results Is Not Absence: Why Agents Treat Retrieval Failure as Proof

Your OAuth Tokens Expire Mid-Task: The Silent Failure Mode of Long-Running Agents

The Orphan Adapter Problem: When Your Fine-Tune Outlives Its Base Model

The Output Commitment Problem: Why Streaming Self-Correction Destroys User Trust More Than the Original Error

Pattern-Matching Failures: When Your LLM Solves the Wrong Problem Fluently

Plan-and-Execute Is Marketing, Not Contract: Plan Adherence as a First-Class SLI

Your Planner Knows About Tools Your User Can't Call

Popularity Bias in Vector Retrieval: Why the Same Five Chunks Dominate Every Query

The Prompt Ownership Problem: When Conway's Law Comes for Your Prompts

About Tian Pan