Blog

Page 89

12 articles

Human-in-the-Loop Is a Queue, and Queues Have Dynamics
Approval steps in agent workflows behave like production queues — with backlog growth, staleness, fatigue, and priority inversion. Here's how to design HITL that survives scale.
insiderai-engineering
Apr 2211 min
Your P99 Is Following a Stranger's Traffic: The Noisy-Neighbor Tax in Hosted LLM Inference
Hosted LLM APIs share GPUs, batches, and KV-cache budgets across tenants you never see, so your tail latency moves with strangers. Here is how to prove it, mitigate it, and decide when to flip to dedicated capacity.
llminfrastructure
Apr 2210 min
Inference Is Faster Than Your Database Now
The model's share of request latency has collapsed. Your own feature store, auth, and Postgres calls are now the long tail — and most AI architectures haven't noticed.
insiderai-infrastructure
Apr 2210 min
Interview Mode vs. Task Mode: The Unspoken Contract Your Agent Keeps Breaking
Most 'asks too many questions' and 'didn't ask enough questions' complaints are the same bug — your agent picked the wrong contract. Here is how to detect and surface it.
agentsux
Apr 2211 min
LLM-as-Compiler Is a Metaphor Your Codebase Can't Survive
Framing LLMs as compilers quietly cancels the disciplines — review, refactoring, architectural judgment — that keep AI-generated codebases maintainable past the six-month wall.
ai-engineeringai-assisted-development
Apr 2210 min
LLM-as-Judge Drift: When Your Evaluator Upgrades and All Your Numbers Move
A regression suite that flips red without any prompt change is usually the judge, not the candidate. How evaluator drift fakes wins and losses, why pinned judges and calibration cadence matter, and what to log in eval metadata to stop the dashboard from lying.
llm-evaluationllm-as-judge
Apr 2211 min
Your LLM Span Is Lying: What APM Tools Don't Show About Inference Latency
Standard APM treats an LLM call as one opaque span — but prefill, decode, cache misses, and batch position all hide inside that duration. Here is the tracing surface you actually need.
insiderllm
Apr 228 min
Markdown Beats JSON: The Output Format Tax You're Paying Without Measuring
Strict JSON mode quietly shaves reasoning accuracy on many tasks. Here's the decoding-time mechanism, the measured gap across markdown, XML, and JSON, and a decision tree for picking a format that fits the job.
insiderllm
Apr 2211 min
The MCP Server Graveyard: When Your Agent's Dependencies Stop Shipping
Third-party MCP servers are the new long-tail dependency risk for AI agents. Abandoned maintainers, stale shims, and inherited CVEs create silent failures that bypass every supply chain alert — here's how to spot an orphan before adoption, and when to fork, vendor, or build your own.
mcpai-agents
Apr 2210 min
Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run
Most agent UIs turn every course correction into a full restart. The fix is an architectural one — checkpoint-and-inject, plan revision hooks, and soft-interrupt tokens — plus a three-verb UX vocabulary that separates correction from override from cancellation.
insideragents
Apr 2210 min
The Missing Arm: Your AI Experiment Has No 'AI-Off' Control
Most AI experiments compare better AI to worse AI and skip the comparison that actually matters — against no AI at all. The null arm is the missing discipline keeping teams from knowing whether their inference spend earns anything.
experimentationai-product
Apr 229 min
Eval Passed, With All Tools Mocked: Why Your Agent's Hardest Failures Never Reach the Harness
Mocked-tool evals make CI green while production burns. The three assumptions every mock silently makes, why the eval pass rate diverges from the incident rate, and the three-rung ladder (mocks, cassettes, live smoke) that finally closes the gap.
ai-engineeringevaluation
Apr 229 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 89

Human-in-the-Loop Is a Queue, and Queues Have Dynamics

Your P99 Is Following a Stranger's Traffic: The Noisy-Neighbor Tax in Hosted LLM Inference

Inference Is Faster Than Your Database Now

Interview Mode vs. Task Mode: The Unspoken Contract Your Agent Keeps Breaking

LLM-as-Compiler Is a Metaphor Your Codebase Can't Survive

LLM-as-Judge Drift: When Your Evaluator Upgrades and All Your Numbers Move

Your LLM Span Is Lying: What APM Tools Don't Show About Inference Latency

Markdown Beats JSON: The Output Format Tax You're Paying Without Measuring

The MCP Server Graveyard: When Your Agent's Dependencies Stop Shipping

Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run

The Missing Arm: Your AI Experiment Has No 'AI-Off' Control

Eval Passed, With All Tools Mocked: Why Your Agent's Hardest Failures Never Reach the Harness

About Tian Pan