Blog

Page 146

12 articles

The LLM Request Lifecycle Your try/catch Is Missing
Wrapping LLM calls in try/catch only catches the easy failures. A state machine approach makes retry, fallback, validation, and escalation paths first-class observable states — and surfaces the failure modes that return HTTP 200.
insiderllm
Apr 910 min
The Long-Horizon Evaluation Gap: Why Your Agent Passes Every Benchmark and Still Fails in Production
Single-turn benchmarks give a false sense of security for production AI agents. A model scoring 75% on SWE-Bench Verified collapses to under 25% on real engineering tasks—here's why the gap is structural and how to build evals that catch it.
ai-agentsevaluation
Apr 911 min
MCP Server Supply Chain Risk: When Your Agent's Tools Become Attack Vectors
Third-party MCP servers are the new npm left-pad problem for AI agents. Real breaches — from Postmark email exfiltration to mcp-remote command injection — reveal five attack vectors and the layered defense patterns that reduce exposure without killing composability.
insidermcp
Apr 99 min
MoE Models in Production: The Serving Quirks Dense-Model Benchmarks Hide
Sparse MoE models need 8.6× more GPU memory than their active-parameter count implies, exhibit latency variance that dense-model monitoring misses, and break naive batching assumptions. Here's the serving analysis that benchmarks skip.
insidermoe
Apr 910 min
Model Fingerprinting: Detecting Silent Provider-Side LLM Swaps Before They Wreck Your Evals
When your LLM provider silently updates the model behind a stable API endpoint, your evals keep passing while your users notice the difference. Here's the fingerprinting and drift-detection stack that catches it first.
insiderllm
Apr 910 min
The Model Migration Playbook: How to Swap Foundation Models Without Breaking Production
A step-by-step playbook for safely migrating foundation models in production — shadow testing, embedding reindexing, prompt adaptation, canary rollouts, and the organizational coordination that separates a two-week swap from a two-month one.
insiderllm-migration
Apr 913 min
The Model Migration Playbook: How to Swap Foundation Models Without a Feature Freeze
A phased production playbook for swapping LLM foundation models — covering shadow deployments, prompt re-engineering across providers, embedding reindexing strategies, and why your eval suite alone won't catch the regressions that matter.
insiderllm-ops
Apr 911 min
Multimodal LLMs in Production: The Cost Math Nobody Runs Upfront
How vision, audio, and video inputs change your LLM token budget — a breakdown of per-modality cost formulas, the multipliers that silently inflate production bills, and the architectural patterns teams use to control costs.
multimodalllm
Apr 911 min
The N+1 Query Problem Has Infected Your AI Agent
The N+1 query problem from the ORM era has re-emerged at the AI agent tool call layer — sequential single-item fetches, redundant re-fetches, and over-fetching are silently inflating your latency and token costs. Here's how to diagnose it and fix it.
ai-agentstool-use
Apr 910 min
The Non-Determinism Tax: Building Reliable Pipelines on Probabilistic Infrastructure
Temperature=0 doesn't make LLMs deterministic. Batch composition, tensor parallelism, and floating-point non-associativity drive up to 72 percentage-point performance swings. Here's how to measure the variance and build application logic that's stable despite it.
llmproduction
Apr 99 min
Non-Deterministic CI for Agentic Systems: Why Binary Pass/Fail Breaks and What Replaces It
Binary pass/fail CI breaks down when every test run is non-deterministic. Statistical verdicts, graduated thresholds, trajectory fingerprinting, and sequential analysis catch real agent regressions without drowning teams in false failures.
insiderai-agents
Apr 99 min
Parallel Tool Calls in LLM Agents: The Coupling Test You Didn't Know You Were Running
Enabling parallel tool execution in LLM agents exposes hidden coupling in your tool design — the three silent failure modes, how to classify tools for safe parallelism, and when to consolidate instead of parallelize.
insiderllm-agents
Apr 910 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 146

The LLM Request Lifecycle Your try/catch Is Missing

The Long-Horizon Evaluation Gap: Why Your Agent Passes Every Benchmark and Still Fails in Production

MCP Server Supply Chain Risk: When Your Agent's Tools Become Attack Vectors

MoE Models in Production: The Serving Quirks Dense-Model Benchmarks Hide

Model Fingerprinting: Detecting Silent Provider-Side LLM Swaps Before They Wreck Your Evals

The Model Migration Playbook: How to Swap Foundation Models Without Breaking Production

The Model Migration Playbook: How to Swap Foundation Models Without a Feature Freeze

Multimodal LLMs in Production: The Cost Math Nobody Runs Upfront

The N+1 Query Problem Has Infected Your AI Agent

The Non-Determinism Tax: Building Reliable Pipelines on Probabilistic Infrastructure

Non-Deterministic CI for Agentic Systems: Why Binary Pass/Fail Breaks and What Replaces It

Parallel Tool Calls in LLM Agents: The Coupling Test You Didn't Know You Were Running

About Tian Pan