Blog

Page 133

12 articles

Spec-to-Eval: Translating Product Requirements into Falsifiable LLM Criteria
Most AI features are specified in prose and evaluated in prose — which is why teams agree at standup and disagree at launch. A practical methodology for converting English requirements into concrete, falsifiable LLM evaluation criteria before writing a single prompt.
llmevaluation
Apr 139 min
Stakeholder Prompt Conflicts: When Platform, Business, and User Instructions Compete at Inference Time
Every production LLM system has at least three instruction authors. When they conflict, the model makes an unaudited priority call. Here's how to make the hierarchy explicit and govern it before it governs you.
llmprompt-engineering
Apr 1310 min
The Ambient AI Coherence Problem: When Every Feature Is AI-Powered, Nothing Feels Like One Product
Deploying AI across search, summaries, chat, and recommendations simultaneously creates cross-feature contradictions that damage user trust more than any single wrong answer. Here's how to build systems that feel like one coherent product.
insiderai-engineering
Apr 139 min
The Anthropomorphism Tax: Why Treating Your Agent Like a Colleague Breaks Production Systems
Why 88% of AI agent projects fail in production has less to do with model quality and more to do with a cognitive bias engineers rarely notice: treating their agent like a smart colleague. The failure modes this produces — missing retry logic, no output validation, confidence-blind escalation — and the mechanistic mental model that fixes them.
ai-agentsreliability
Apr 1310 min
The Context Window Cliff: What Actually Happens When Your Agent Hits the Limit Mid-Task
AI agents don't crash when they hit context limits — they silently make wrong decisions. Here's how context overflow actually fails in production and the architectural patterns that prevent it.
ai-agentscontext-window
Apr 139 min
The Enterprise API Impedance Mismatch: Why Your AI Agent Wastes 60% of Its Tokens Before Doing Anything Useful
Enterprise APIs burn through AI agent token budgets with verbose formats, semantic mismatches, and implementation-leaked tool schemas — here's how outcome-oriented adapters, dynamic toolsets, and semantic metadata layers fix it.
insiderai-agents
Apr 138 min
The Good Enough Model Selection Trap: Why Your Team Is Overpaying for AI
Most teams run every AI feature on their most expensive model because the demo was built that way. A task-complexity audit, a three-tier routing policy, and the right A/B testing approach can cut your AI spend in half without users noticing.
insiderllm
Apr 139 min
The Inference Cost Paradox: Why Your AI Bill Goes Up as Models Get Cheaper
Per-token LLM prices have dropped 1,000x in three years. Enterprise AI spending surged 320% in 2025. Both facts are true simultaneously — here's the mechanism and what to do about it.
insiderai-engineering
Apr 1310 min
The Inference-Time Personalization Trap: When User Context Costs More Than It Earns
Adding user history to every LLM prompt feels like an obvious win — until you measure the cost per token of quality gained. Here's where inference-time personalization stops paying and what production architectures do instead.
llmpersonalization
Apr 139 min
The Instruction Position Problem: Where You Place Things in Your Prompt Is an Architecture Decision
Where you place instructions in your LLM prompt determines whether the model follows them. Primacy and recency effects cause mid-prompt rules to lose 30–50% compliance — and most teams discover this only in production.
prompt-engineeringllm
Apr 139 min
The LLM Forgery Problem: When Your Model Builds a Convincing Case for the Wrong Answer
LLMs don't just hallucinate facts — they also fabricate reasoning. The forgery problem is when a model decides first and explains second, producing a plausible-sounding synthesis built on selectively ignored evidence.
insiderllm
Apr 1310 min
The Metered AI Pricing Death Spiral: Why Per-Token Billing Punishes Your Best Features
Per-token billing creates perverse incentives where your most valuable AI features cost the most to run. Hybrid and outcome-based pricing models realign cost with delivered value.
ai-engineeringpricing
Apr 138 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 133

Spec-to-Eval: Translating Product Requirements into Falsifiable LLM Criteria

Stakeholder Prompt Conflicts: When Platform, Business, and User Instructions Compete at Inference Time

The Ambient AI Coherence Problem: When Every Feature Is AI-Powered, Nothing Feels Like One Product

The Anthropomorphism Tax: Why Treating Your Agent Like a Colleague Breaks Production Systems

The Context Window Cliff: What Actually Happens When Your Agent Hits the Limit Mid-Task

The Enterprise API Impedance Mismatch: Why Your AI Agent Wastes 60% of Its Tokens Before Doing Anything Useful

The Good Enough Model Selection Trap: Why Your Team Is Overpaying for AI

The Inference Cost Paradox: Why Your AI Bill Goes Up as Models Get Cheaper

The Inference-Time Personalization Trap: When User Context Costs More Than It Earns

The Instruction Position Problem: Where You Place Things in Your Prompt Is an Architecture Decision

The LLM Forgery Problem: When Your Model Builds a Convincing Case for the Wrong Answer

The Metered AI Pricing Death Spiral: Why Per-Token Billing Punishes Your Best Features

About Tian Pan