Blog

Page 130

12 articles

Schema Entropy: Why Your Tool Definitions Are Rotting in Production
Tool definitions in production AI systems degrade silently over months. Here's how schema entropy forms, why agents can't self-correct, and the versioning and contract-testing practices that catch rot before it breaks live agents.
insiderai-engineering
Apr 1410 min
The Selective Abstention Problem: Why AI Systems That Always Answer Are Broken
Most AI product design optimizes for better answers. The harder, more valuable capability is principled non-answering — and almost no team builds it deliberately.
insiderai-engineering
Apr 1410 min
The Semantic Validation Layer: Why JSON Schema Isn't Enough for Production LLM Outputs
Constrained decoding guarantees your LLM outputs are valid JSON. It cannot guarantee they make sense. Here's the two-layer validation architecture that catches the failures schema can't see.
insiderllm
Apr 1410 min
Silent Async Agent Failures: Why Your AI Jobs Die Without Anyone Noticing
Async AI jobs fail silently and confidently — HTTP 200, dashboards green, customers eventually complaining. Here's how dead letter queues, idempotency keys, and saga logs translate from conventional distributed systems to fix the problem.
insiderai-agents
Apr 149 min
Staffing AI Engineering Teams: Who Owns What When Every Feature Has an AI Component
How the skills split between ML engineers, data engineers, and product engineers shifts when LLMs commoditize modeling—and how to staff, structure, and assign ownership when every feature has an AI component.
ai-engineeringteam-structure
Apr 1411 min
Stale Retrieval: The Data Quality Problem Your RAG Pipeline Is Hiding
RAG pipelines fail silently when their retrieval corpus drifts — outdated facts, deleted documents, and stale embeddings that pass every faithfulness metric. Here's how to detect it, propagate deletions, and build freshness into your pipeline from the start.
ragretrieval
Apr 1410 min
Your LLM Eval Is Lying to You: The Statistical Power Problem
Most LLM eval suites run on 50–200 examples and claim significance they don't have. Here's the math that shows why your evals can't detect the improvements you're making — and what to do about it.
insiderllm
Apr 149 min
The AI Adoption Paradox: Why the Highest-Value Domains Get AI Last
Healthcare sits at 39% AI adoption while software companies hit 92% — yet healthcare has more to gain. The gap isn't risk aversion. It's a structural mismatch between accuracy thresholds, compliance timing, and deployment architecture.
insiderai
Apr 148 min
The AI Rollback Ritual: Post-Incident Recovery When the Damage Is Behavioral, Not Binary
Behavioral regressions in LLM systems don't fail your tests or trigger your alerts. Here's how to detect, diagnose, and recover from the failure mode that looks like success.
llmopsobservability
Apr 1411 min
The Curriculum Trap: Why Fine-Tuning on Your Best Examples Produces Mediocre Models
Curating only high-quality, confident outputs as fine-tuning data creates distribution mismatch, destroys uncertainty awareness, and produces models that are confidently wrong. Here's why—and what to do instead.
insiderfine-tuning
Apr 1410 min
The Integration Test Mirage: Why Mocked Tool Outputs Hide Your Agent's Real Failure Modes
Agents built against mocks never encounter the failures that bite in production: pagination loops, rate limits mid-sequence, partial success responses, and schema ambiguity. Here's what to do instead.
ai-engineeringtesting
Apr 1411 min
The Overclaiming Trap: When Being Right for the Wrong Reasons Destroys AI Product Trust
When AI systems produce correct answers via fabricated reasoning chains, power users who check the work lose trust permanently — faster than if the system had simply been wrong.
insiderai-engineering
Apr 1410 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 130

Schema Entropy: Why Your Tool Definitions Are Rotting in Production

The Selective Abstention Problem: Why AI Systems That Always Answer Are Broken

The Semantic Validation Layer: Why JSON Schema Isn't Enough for Production LLM Outputs

Silent Async Agent Failures: Why Your AI Jobs Die Without Anyone Noticing

Staffing AI Engineering Teams: Who Owns What When Every Feature Has an AI Component

Stale Retrieval: The Data Quality Problem Your RAG Pipeline Is Hiding

Your LLM Eval Is Lying to You: The Statistical Power Problem

The AI Adoption Paradox: Why the Highest-Value Domains Get AI Last

The AI Rollback Ritual: Post-Incident Recovery When the Damage Is Behavioral, Not Binary

The Curriculum Trap: Why Fine-Tuning on Your Best Examples Produces Mediocre Models

The Integration Test Mirage: Why Mocked Tool Outputs Hide Your Agent's Real Failure Modes

The Overclaiming Trap: When Being Right for the Wrong Reasons Destroys AI Product Trust

About Tian Pan