Blog

Page 141

12 articles

Structured Outputs and Constrained Decoding: Eliminating Parsing Failures in Production LLMs
Constrained decoding guarantees schema-valid LLM output at the token level, removing retry logic and parsing heuristics from production pipelines — but research shows a 17% creativity cost that demands a clear decision framework.
insiderstructured-outputs
Apr 119 min
Synthetic Data Pipelines That Don't Collapse: Generating Training Data at Scale
Model collapse silently degrades LLMs trained on their own output. Learn the pipeline architecture — accumulative mixing, multi-source generation, verification stacks, and diversity monitoring — that keeps synthetic training data productive instead of poisonous.
insidersynthetic-data
Apr 118 min
The AI Wrapper Trap: When Your Moat Is Someone Else's API Call
Why thin-wrapper AI startups face existential risk every model release cycle — and the three defensibility layers (proprietary data flywheels, domain-specific evals, workflow integration) that separate survivors from cautionary tales.
insiderai-product-strategy
Apr 1110 min
The Autonomy Dial: Five Levels for Shipping AI Features Without Betting the Company
A five-level framework for graduating AI features from suggestion to full autonomy, with concrete metrics at each transition, leading indicators for dialing back, and the bounded autonomy pattern that maps decision risk to oversight level.
ai-autonomyhuman-in-the-loop
Apr 1111 min
The Calibration Gap: Your LLM Says 90% Confident but Is Right 60% of the Time
LLM confidence scores routinely overstate accuracy by 30–80 percentage points. How to measure the calibration gap with reliability diagrams and ECE, fix it with temperature scaling and adaptive recalibration, and design production systems that stay reliable when confidence lies.
insiderllm-calibration
Apr 1110 min
The Forgetting Problem: When Unbounded Agent Memory Degrades Performance
Unbounded agent memory stores silently degrade performance as stale facts, cross-context contamination, and error propagation accumulate. Practical forgetting strategies — time-based decay, access-frequency reinforcement, selective addition, and active consolidation — plus the eval methodology to measure whether memory is helping or hurting.
insideragent-memory
Apr 119 min
The Instruction-Following Cliff: Why Adding One More Rule to Your System Prompt Breaks Three Others
LLM compliance doesn't degrade linearly — it hits a cliff where adding one more rule destabilizes others. Research shows even frontier models cap at 68% accuracy under high instruction density. Here's why rules fight each other and how decomposition patterns keep your system prompt reliable.
llmprompt-engineering
Apr 117 min
The Observability Tax: When Monitoring Your AI Costs More Than Running It
AI workloads generate 10–50x more telemetry than traditional services, pushing monitoring bills past inference costs. A practical guide to tiered sampling, retention policies, and tool consolidation that cuts observability spend by 50–90% without losing signal.
observabilityllm-ops
Apr 118 min
The Planning Tax: Why Your Agent Spends More Tokens Thinking Than Doing
LLM agents burn 40-70% of their token budget on planning before executing a single tool call. A breakdown of where reasoning tokens go, why more thinking doesn't always mean better outcomes, and the architectural patterns — ReWOO, plan caching, hierarchical decomposition — that reclaim your budget.
insiderai-agents
Apr 1110 min
The Second System Effect in AI: Why Your Agent v2 Rewrite Will Probably Fail
Fred Brooks warned about the second system effect in 1975 — and it's now the leading cause of failed AI agent rewrites. 68% of multi-agent deployments would have performed equally well as single-agent systems, yet teams keep reaching for architectural complexity they don't need.
insiderai-agents
Apr 119 min
The Trust Calibration Curve: How Users Learn to (Mis)Trust AI
The over-trust → failure → over-correction lifecycle that kills AI product adoption. Why single high-salience errors collapse trust disproportionately, and the design patterns that build durable, calibrated user trust.
ai-engineeringproduct
Apr 119 min
Vision Inputs in Production AI Pipelines: The Preprocessing Decisions Nobody Documents
How image resolution, compression artifacts, OCR preprocessing, and aspect-ratio handling silently degrade vision model accuracy in production — and the normalization pipeline that separates model failures from input failures.
visionmultimodal
Apr 1110 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 141

Structured Outputs and Constrained Decoding: Eliminating Parsing Failures in Production LLMs

Synthetic Data Pipelines That Don't Collapse: Generating Training Data at Scale

The AI Wrapper Trap: When Your Moat Is Someone Else's API Call

The Autonomy Dial: Five Levels for Shipping AI Features Without Betting the Company

The Calibration Gap: Your LLM Says 90% Confident but Is Right 60% of the Time

The Forgetting Problem: When Unbounded Agent Memory Degrades Performance

The Instruction-Following Cliff: Why Adding One More Rule to Your System Prompt Breaks Three Others

The Observability Tax: When Monitoring Your AI Costs More Than Running It

The Planning Tax: Why Your Agent Spends More Tokens Thinking Than Doing

The Second System Effect in AI: Why Your Agent v2 Rewrite Will Probably Fail

The Trust Calibration Curve: How Users Learn to (Mis)Trust AI

Vision Inputs in Production AI Pipelines: The Preprocessing Decisions Nobody Documents

About Tian Pan