Blog

Page 49

12 articles

Load Shedding Was Built for Humans. Agents Amplify the Storm You're Shedding
Agents replan around 503s and retry faster than any human ever will, turning a small upstream wobble into a correlated outage. A practitioner's view of the load-shedding primitives platforms need next, and the disciplines agents have to adopt to stop being the storm.
insideragents
Apr 2612 min
Long-Context vs RAG in 2026: Why It Is a Per-Feature Decision, Not an Architecture Religion
Long-context vs RAG is not a product-wide architecture choice in 2026 — it is a per-feature decision driven by four axes (freshness, attribution, tail-risk, cost). A breakdown of the discipline that keeps your AI surfaces on the right side of math that keeps moving.
insiderllm
Apr 2613 min
The Model Deprecation Treadmill: Discipline That Has to Exist Before the Sunset Email
Provider sunset emails arrive on a 60-day clock. The registry, calendar, n+1 evals, and contract terms that turn each migration into mechanical work — built before the email lands, not after.
llm-opsmodel-migration
Apr 2613 min
Your Model Router Was Trained on Your Eval Set, Not Your Traffic
Benchmark-trained routers ship a quiet quality regression: the cheap path looks fine in aggregate, then fails on a small, loud cluster of users your eval suite never sampled. Why a router is a control system, not a classifier — and what closing the loop actually requires.
insiderllm-routing
Apr 2610 min
Multimodal Eval Drift: Why Your Image and Audio Paths Regress While Text Stays Green
Most teams shipped multimodal as a thin extension of their text product and inherited an eval discipline that systematically can't see image or audio regressions. The fix is per-modality rubrics, modality-specific gold sets, and a release gate that refuses to aggregate quality across input types.
insidermultimodal
Apr 2611 min
Per-Tenant Inference Isolation: When Shared Cache, Fine-Tunes, and Embeddings Leak Across Customers
AI features quietly broke the multi-tenant isolation playbook at four new layers — prompt cache, fine-tune, embedding index, KV-cache reuse. What changed and the discipline production teams need to put back.
multi-tenancyai-security
Apr 2612 min
The 30-Day Prompt Apprenticeship: Onboarding Engineers When 'Read the Code' Doesn't Work
A 200-line system prompt has no signature, no tests, and a diff history that says nothing about why each line is there. A 30-day curriculum — failure gallery, ablation, PR reconstruction, gated edit — that teaches new engineers to read a prompt by interrogating its behavior.
insiderprompt-engineering
Apr 2612 min
Prompt Asset Depreciation: The Maintenance Schedule Your AI Team Doesn't Keep
Production prompts decay silently as models, tokenizers, and product rules shift underneath them. Treat every prompt as a depreciating asset with an owner, a revalidate-by date, and an eval delta — or accept the quality regression nobody on the team intentionally shipped.
prompt-engineeringllmops
Apr 269 min
Prompt Bisect: Binary-Searching the Edit That Broke Your Eval
An overnight two-point eval drop and a prompt PR with seventeen edits is a binary search problem, not a guessing game. Here is how to bisect a prompt the way kernel maintainers bisect a kernel — and the commit-granularity discipline it forces on the team.
llmprompt-engineering
Apr 2610 min
Prompt-Eligibility: The Missing Column in Your Data Classification
Most data classification schemes never modeled the prompt layer as a vendor egress channel. Adding a prompt-eligibility tier — and the template audit that fills it — closes a compliance gap your DLP scheme silently denies.
insiderprivacy
Apr 2611 min
Your System Prompt Will Leak: Designing for Prompt Extraction
Prompt extraction is the quiet attack on LLM products. Treat the system prompt as public, move secrets out of context, and build an eval for it.
insiderllm-security
Apr 2610 min
Prompt-Version Skew Across Regions: The Unintended A/B Test Your CDN Ran for Six Hours
Pushing prompts through CDN-style rollout systems creates silent geography-split A/B tests when one region drifts ahead of another. Here is the rollout discipline, observability dimension, and rollback model that keep prompt versions globally coherent.
llmmlops
Apr 2610 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 49

Load Shedding Was Built for Humans. Agents Amplify the Storm You're Shedding

Long-Context vs RAG in 2026: Why It Is a Per-Feature Decision, Not an Architecture Religion

The Model Deprecation Treadmill: Discipline That Has to Exist Before the Sunset Email

Your Model Router Was Trained on Your Eval Set, Not Your Traffic

Multimodal Eval Drift: Why Your Image and Audio Paths Regress While Text Stays Green

Per-Tenant Inference Isolation: When Shared Cache, Fine-Tunes, and Embeddings Leak Across Customers

The 30-Day Prompt Apprenticeship: Onboarding Engineers When 'Read the Code' Doesn't Work

Prompt Asset Depreciation: The Maintenance Schedule Your AI Team Doesn't Keep

Prompt Bisect: Binary-Searching the Edit That Broke Your Eval

Prompt-Eligibility: The Missing Column in Your Data Classification

Your System Prompt Will Leak: Designing for Prompt Extraction

Prompt-Version Skew Across Regions: The Unintended A/B Test Your CDN Ran for Six Hours

About Tian Pan