Blog

Page 37

12 articles

Prompt Portfolios: Manage a Basket, Not a Single Best Prompt
Production prompt management treats prompts as singular winners. Treat them as a portfolio instead: weighted variants, segment-aware allocation, and weekly rebalancing.
insiderprompt-engineering
May 1210 min
Prompts Don't Roll Back Like Code: Why git revert Is the Wrong Primitive
git revert restores a deterministic past state. Prompt rollback has to reconcile with caches, conversation histories, eval baselines, and A/B cohorts the bad prompt already shaped — most teams find that out the hard way.
insiderllmops
May 129 min
Quantization Slippage: The Capability Tax Your Eval Set Was Never Built to Catch
Quantizing an LLM from fp16 to int4 ships a different model wearing the same weights. The eval suite calibrated to the original silently grades the new one wrong — here is the capability slippage to budget for before the customers notice it first.
insiderllm-quantization
May 1211 min
Reasoning-Model Arbitrage: The Slow Expensive Model Is Cheaper on the Hard Prompts
Per-token pricing reports the cost of the median request, not the all-in cost of the distribution your product actually serves. Routing the hard prompts to a reasoning model beats workhorse-by-default once retries, escalations, and trust damage land on the P&L.
insiderllm-routing
May 1210 min
The Rerun Antipattern: Why Rolling Again Doesn't Find Bugs
Rerunning a failed AI prompt feels like a variance probe but acts like survivorship bias — masking deterministic bugs while burning unbudgeted tokens. Trace-first debugging and N-of-K discipline replace it.
insiderai-engineering
May 1210 min
The Self-Critique Tax: When Asking the Model to Check Its Own Work Costs Double for Modest Wins
Self-Refine, Chain-of-Verification, and reflection prompts promise big quality lifts on benchmarks — but in production they triple costs, balloon latency, and deliver a fraction of the advertised gain. Here is how to price the self-critique tax before shipping it.
insiderllm
May 1211 min
The Sliding-Window Tax: Why a 30-Turn Conversation Costs More Than 30x a Single Turn
Multi-turn AI features get billed by per-call dashboards but pay by per-conversation curves. The tail is super-linear, and the bill comes from there.
insiderai-engineering
May 129 min
Snapshot Eval Decay: When Green CI Stops Meaning Your Product Still Works
A green eval suite that ran for six months may already be testing yesterday's product against yesterday's reality — here is how snapshot eval decay hides in plain sight and how to keep an eval set alive.
insiderevals
May 1211 min
The Streamed-Response Trace Schema Gap: Why Your APM Lies About LLM Latency
Streaming LLM responses break the request/response span model. The duration field lies; failures live between the boundaries — TTFT regressions, mid-stream stalls, content loops — and the fix is checkpointed token-time events with a real tail-event taxonomy.
llm-observabilitystreaming
May 1210 min
Tenancy Leaks Through Few-Shot Examples: When Your Prompt Library Becomes a Cross-Customer Data Store
Mining production traces for few-shot examples quietly turns your system prompt into an unaudited multi-tenant data store. Here is how the leak happens, why it is a contract breach, and the discipline that catches it before a customer does.
insiderai-engineering
May 1211 min
The Agentic Stamp: When Marketing Names It and Engineering Pays the Operational Bill
Marketing calls a workflow an agent, and engineering inherits the observability, tool-budget, and escalation work nobody scoped — a leadership decision dressed up as a naming choice.
ai-agentsproduct-management
May 1210 min
Token Accounting Drift: When Your Trace Logs Don't Match the Provider Invoice
Every team building on a hosted LLM eventually finds the token counts in their traces don't match the monthly invoice. The gap is rarely fraud — it's a structural measurement problem with six compounding causes.
llmfinops
May 129 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 37

Prompt Portfolios: Manage a Basket, Not a Single Best Prompt

Prompts Don't Roll Back Like Code: Why git revert Is the Wrong Primitive

Quantization Slippage: The Capability Tax Your Eval Set Was Never Built to Catch

Reasoning-Model Arbitrage: The Slow Expensive Model Is Cheaper on the Hard Prompts

The Rerun Antipattern: Why Rolling Again Doesn't Find Bugs

The Self-Critique Tax: When Asking the Model to Check Its Own Work Costs Double for Modest Wins

The Sliding-Window Tax: Why a 30-Turn Conversation Costs More Than 30x a Single Turn

Snapshot Eval Decay: When Green CI Stops Meaning Your Product Still Works

The Streamed-Response Trace Schema Gap: Why Your APM Lies About LLM Latency

Tenancy Leaks Through Few-Shot Examples: When Your Prompt Library Becomes a Cross-Customer Data Store

The Agentic Stamp: When Marketing Names It and Engineering Pays the Operational Bill

Token Accounting Drift: When Your Trace Logs Don't Match the Provider Invoice

About Tian Pan