Blog

Page 99

12 articles

Reasoning Model Economics: When Chain-of-Thought Earns Its Cost
Extended thinking models cost 10–50x more per query. Here's the task taxonomy that tells you when that premium pays off — and the routing architecture that applies it automatically.
insiderllm
Apr 199 min
The Reranker Gap: Why Most RAG Pipelines Skip the Most Important Layer
Most RAG pipelines stop at vector similarity search and wonder why accuracy plateaus. The reranker is the missing layer — here's what it costs to skip it and how to decide when the tradeoff is worth it.
ragretrieval
Apr 198 min
Sequential Tool Call Waterfalls: The Hidden Latency Tax in Agent Loops
Agent frameworks default to sequential tool execution even when calls are logically independent, creating latency cascades identical to the N+1 query problem. Here's how to identify and fix them.
insiderai-agents
Apr 1910 min
Shadow to Autopilot: A Readiness Framework for AI Feature Autonomy
Moving AI from shadow mode through advisory, co-pilot, and autopilot stages requires explicit quality gates and monitoring, not just organizational courage. Here's the engineering framework.
insiderai-engineering
Apr 1911 min
The Share-Nothing Agent: Designing AI Agents for Horizontal Scalability
Most AI agents can't scale horizontally because they accumulate implicit state that ties them to a single machine. Here's the architectural discipline that fixes it.
insideragent-architecture
Apr 1912 min
The Six-Month Cliff: Why Production AI Systems Degrade Without a Single Code Change
Your AI feature shipped green and performed well at launch. Six months later it's quietly 20–40% worse — and your dashboards never flagged it. Here's why this happens and how to stop it.
llmproduction
Apr 199 min
What 99.9% Uptime Means When Your Model Is Occasionally Wrong
Traditional SLAs are meaningless for AI features where success is probabilistic. Here's the contract language and internal SLO design that lets engineering teams ship AI without open-ended liability.
insiderai-engineering
Apr 1910 min
Structured Output Reliability in Production: Why JSON Mode Is Not a Contract
JSON mode guarantees valid syntax — not correct answers. A breakdown of the three failure modes that kill production AI pipelines and the three-layer validation architecture that actually catches them.
insiderllm
Apr 198 min
Subgroup Fairness Testing in Production AI: Why Aggregate Accuracy Lies
Aggregate accuracy hides systematic failures for specific demographic and linguistic subgroups. The subgroup eval methodology, disparity SLOs, and production monitoring patterns that catch bias before it reaches users at scale.
ai-engineeringevaluation
Apr 1911 min
The Sycophancy Trap: Why AI Validation Tools Agree When They Should Push Back
RLHF-trained models have a systematic agreement bias that makes them dangerous for code review, fact-checking, and decision support. How to measure it and restore appropriate pushback.
insiderllm
Apr 1912 min
Synthetic Eval Bootstrapping: How to Build Ground-Truth Datasets When You Have No Labeled Data
How to build a working LLM evaluation pipeline from zero labeled data using synthetic test generation, human-validated anchors, cross-model disagreement, and behavioral invariants — plus the failure modes that synthetic evals share with the models they test.
evaluationllm
Apr 1910 min
System Prompt Sprawl: When Your AI Instructions Become a Source of Bugs
As system prompts grow from hundreds to thousands of tokens, internal contradictions accumulate and model behavior becomes unpredictable. Here's how to detect, contain, and restructure before it costs you.
insiderprompt-engineering
Apr 199 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 99

Reasoning Model Economics: When Chain-of-Thought Earns Its Cost

The Reranker Gap: Why Most RAG Pipelines Skip the Most Important Layer

Sequential Tool Call Waterfalls: The Hidden Latency Tax in Agent Loops

Shadow to Autopilot: A Readiness Framework for AI Feature Autonomy

The Share-Nothing Agent: Designing AI Agents for Horizontal Scalability

The Six-Month Cliff: Why Production AI Systems Degrade Without a Single Code Change

What 99.9% Uptime Means When Your Model Is Occasionally Wrong

Structured Output Reliability in Production: Why JSON Mode Is Not a Contract

Subgroup Fairness Testing in Production AI: Why Aggregate Accuracy Lies

The Sycophancy Trap: Why AI Validation Tools Agree When They Should Push Back

Synthetic Eval Bootstrapping: How to Build Ground-Truth Datasets When You Have No Labeled Data

System Prompt Sprawl: When Your AI Instructions Become a Source of Bugs

About Tian Pan