Blog

Page 44

12 articles

Your Tool-Result Cache Is a Stale-Data Contract You Never Wrote
Cached tool results that look clean in the trace are quietly producing confidently-wrong agent answers. Treat the cache as a per-tool freshness contract — TTLs by volatility, freshness metadata in the result, bypass tiers, and a stale-cache eval slice.
ai-engineeringagents
Apr 2711 min
Tool Schemas Are Prompts, Not API Contracts
Auto-generating LLM tool schemas from your OpenAPI spec ships your API documentation as prompt — and your agents pay the cost in misuse you never see in tests.
insiderllm-tools
Apr 2711 min
Translation Is Not Localization: The Cultural-Calibration Debt Your Multilingual AI Just Defaulted On
Shipping translated prompts and translated evals is not a multilingual launch. The failure modes are cultural, not linguistic, and your dashboards cannot see them.
insidermultilingual
Apr 2712 min
The 12-Month AI Feature Cliff: Why Your Production Models Decay on a Calendar Nobody Marked
AI features ship at 92% pass rate and slide to 78% twelve months later with no single change to blame. Five compounding clocks — model deprecations, weight rotations, input drift, prompt-patch debt, judge calibration — produce a cliff most teams discover only at deprecation deadline. The maintenance cadence that has to be on the calendar before launch.
ai-engineeringllm-ops
Apr 2711 min
The Two-Language Problem: Why Type Safety Stops at the Prompt Boundary
Static type systems go blind at the prompt boundary. Three failure modes — interpolation, schema-as-prose, output parsing — and the disciplines that close the gap when the compiler can't see the seam.
ai-engineeringtype-safety
Apr 2710 min
The Two-PM Problem: When Prompt Ownership and Product Ownership Drift Apart
Most AI teams split prompt ownership from product ownership and pay the coordination tax in regressions nobody owns. Here is the failure pattern and the rituals — shared release calendar, single dashboard, joint incident channel, and a four-artifact RACI — that make the split survivable.
insiderai-product
Apr 2711 min
Your Vector Store Has Hot Keys: Why ANN Indexes Lie About Production Cost
Public ANN benchmarks run uniform query workloads, but production retrieval is Zipfian — and the gap shows up as melted shards, wasted RAM, and a p99 nobody planned for.
insidervector-database
Apr 2710 min
Vendor Benchmarks Are Your Ceiling, Not Your Forecast
Vendor benchmark numbers describe a controlled harness, not your stack. The realized lift on your product is structurally smaller — and the only forecast worth signing budget against is your own shadow eval.
llmevaluation
Apr 2710 min
The 80-Question Wall: What Enterprise AI Security Questionnaires Actually Demand
Enterprise CISOs now run AI-specific security reviews with 80+ questions on training data, prompt logs, tenant isolation, and refusal behavior. A field guide to what they actually want.
ai-securityenterprise
Apr 2611 min
Variance Eats the Experiment: Why A/B Power Math Breaks for LLM Features
Classical A/B math assumes deterministic per-user behavior. LLM features break that assumption twice over, and the standard sample-size template ships wrong calls in both directions — here are the four shifts that fix it.
insiderexperimentation
Apr 2611 min
The Agent Finished Into an Empty Room: Stale-Context Delivery for Async Background Tasks
Async agents that finish 90 seconds late often deliver answers to questions the user no longer has. A delivery-time relevance gate, not faster models, is the fix.
insideragents
Apr 2610 min
The Agent Flight Recorder: Capture These Fields Before Your First Incident
When an agent goes off the rails, the forensic record most teams have is useless. Here are the fields a flight recorder must capture before the first incident — and the storage, sampling, and privacy disciplines that have to land alongside it.
insiderai-engineering
Apr 2613 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 44

Your Tool-Result Cache Is a Stale-Data Contract You Never Wrote

Tool Schemas Are Prompts, Not API Contracts

Translation Is Not Localization: The Cultural-Calibration Debt Your Multilingual AI Just Defaulted On

The 12-Month AI Feature Cliff: Why Your Production Models Decay on a Calendar Nobody Marked

The Two-Language Problem: Why Type Safety Stops at the Prompt Boundary

The Two-PM Problem: When Prompt Ownership and Product Ownership Drift Apart

Your Vector Store Has Hot Keys: Why ANN Indexes Lie About Production Cost

Vendor Benchmarks Are Your Ceiling, Not Your Forecast

The 80-Question Wall: What Enterprise AI Security Questionnaires Actually Demand

Variance Eats the Experiment: Why A/B Power Math Breaks for LLM Features

The Agent Finished Into an Empty Room: Stale-Context Delivery for Async Background Tasks

The Agent Flight Recorder: Capture These Fields Before Your First Incident

About Tian Pan