Blog

Page 42

12 articles

Model Rollback Velocity: The Seven-Hour Gap Between 'This Upgrade Is Wrong' and 'Old Model Fully Restored'
Rolling back an LLM upgrade isn't a button press — it's a partial, hysteretic operation closer to a database migration. Here's the control plane your incident playbook needs before the next bad model rolls out.
llmopsmlops
Apr 2712 min
LLM Model Routing Is Market Segmentation Disguised As A Cost Optimization
Routing 60% of LLM traffic to a cheaper model bends the cost graph — and silently splits your AI feature into two products. The aggregate accuracy metric averages over the segment that gets hurt, two failure modes show up as one bug report, and customers experience two assistants with no release notes.
insiderllm
Apr 2710 min
Multilingual Eval Cost Amplification: Why Seven Locales Doesn't Cost 7×
Your English eval suite cost $40K. The seven-locale international launch will not cost $280K — the real curve is closer to N×L^1.3 because cross-locale comparison is a meta-eval that doesn't decompose.
insiderai-engineering
Apr 2714 min
Your On-Call Rotation Needs an AI-Literacy Prerequisite Before It Pages Anyone at 2am
Shared on-call rotations break the moment one of those services is an LLM-backed feature. Here is the literacy prerequisite, dashboard hygiene, and shadow-period playbook that keeps the AI team out of bed at 2am.
insideron-call
Apr 2712 min
On-Device AI Needs a Fleet Manager, Not a Model Card
Shipping one on-device model to every user means you're either burning battery on flagships or shipping a degraded product on the long tail. The discipline that fixes it looks more like a CDN than a model registry.
insideron-device-ai
Apr 2712 min
Pagination Is a Tool-Catalog Discipline: Why Agents Burn Context on List Returns
Tools that return unbounded lists turn agents into the SELECT * antipattern of the function-calling era. Pagination is a load-shedding primitive — make it a convention in your tool catalog, not a per-tool decision.
ai-agentstool-design
Apr 2711 min
Per-Vector Version Tags: The Missing Column Behind Every Embedding Migration
Vector stores ship without the migration tooling Postgres has had for two decades — no ALTER TABLE, no online schema change, no per-row version. The discipline that makes embedding upgrades survivable starts with a single column most teams forget to add.
insiderrag
Apr 2710 min
Prompt Cache Thrashing: When Your Largest Tenant's Launch Triples Everyone's Bill
Prompt caching's discount is real until one tenant's launch evicts everyone else's prefixes. The shared inference cache is a tenant-coupling surface, and the bill lands weeks after the incident.
llmprompt-caching
Apr 2710 min
Prompt Deprecation Contracts: Why a Wording Cleanup Is a Breaking Change
A four-word edit to a system prompt can break parsers, judges, and chained agents that pinned the old wording. Prompts are APIs with silent consumers — and the discipline that keeps them stable looks a lot like REST endpoint deprecation.
insiderprompt-engineering
Apr 279 min
Prompt Linting Is the Missing Layer Between Eval and Production
Behavioral evals catch what your model says; they cannot catch what your prompt is. A prompt linter — fast, deterministic, structural — closes the gap that ships eval-green and surfaces as a 11pm production incident.
prompt-engineeringllm-ops
Apr 2711 min
Prompt Position Is Policy: The Silent Merge Conflict When Three Teams Co-Own a System Prompt
When a system prompt grows past 2K tokens, position bias makes a moved instruction as load-bearing as a rewritten one — and line-based diffs hide it. How three teams silently overwrite each other's intent, and the section-ownership and eval discipline that catches it.
prompt-engineeringai-infrastructure
Apr 2711 min
The Customer Record Hiding in Your Few-Shot Prompt Template
The 'representative customer' you pasted into your few-shot prompt six months ago is still in production — re-identifiable, re-shipped, and invisible to DLP.
insiderai-engineering
Apr 2711 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 42

Model Rollback Velocity: The Seven-Hour Gap Between 'This Upgrade Is Wrong' and 'Old Model Fully Restored'

LLM Model Routing Is Market Segmentation Disguised As A Cost Optimization

Multilingual Eval Cost Amplification: Why Seven Locales Doesn't Cost 7×

Your On-Call Rotation Needs an AI-Literacy Prerequisite Before It Pages Anyone at 2am

On-Device AI Needs a Fleet Manager, Not a Model Card

Pagination Is a Tool-Catalog Discipline: Why Agents Burn Context on List Returns

Per-Vector Version Tags: The Missing Column Behind Every Embedding Migration

Prompt Cache Thrashing: When Your Largest Tenant's Launch Triples Everyone's Bill

Prompt Deprecation Contracts: Why a Wording Cleanup Is a Breaking Change

Prompt Linting Is the Missing Layer Between Eval and Production

Prompt Position Is Policy: The Silent Merge Conflict When Three Teams Co-Own a System Prompt

The Customer Record Hiding in Your Few-Shot Prompt Template

About Tian Pan