Blog

Page 54

12 articles

The Local-Maximum Trap in Prompt Iteration: How to Tell You're Tweaking the Wrong Thing
Six weeks of prompt tweaks and the eval score still oscillates in a four-point band? You're at a local maximum. Here's how to diagnose which ceiling you've actually hit — prompt, retrieval, model, spec, or data — and pick the lever that breaks it.
llmprompt-engineering
Apr 2510 min
Persona Drift: When Your Agent Forgets Who It's Supposed to Be
Long conversations quietly erode the system prompt. Why agent personas drift after 8–12 turns, how to measure the half-life, and the reinforcement patterns that hold.
insideragents
Apr 2511 min
Sovereignty Collapse: Logging Where Your Prompt Actually Went
Application logs say the request went to eu-west-1. The provider routed it through US and Singapore during a failover. Build the per-request sovereignty path that turns an audit into a query.
insiderllm
Apr 259 min
The Query Rewriting Layer Your RAG Pipeline Skipped
Most retrieval failures are query-shape failures, not embedding failures. A walkthrough of HyDE, decomposition, multi-query fan-out, and rank fusion — and how to diagnose which one your RAG pipeline actually needs before you reach for an encoder swap.
insiderrag
Apr 2510 min
The AI Off-Switch That Doesn't Exist: Retiring Features After Users Co-Author the Archive
Retiring an AI feature is a migration, not a toggle. A field guide to grandfathering co-authored artifacts, separating writers from readers, and writing exit criteria at launch.
insiderai-product
Apr 2511 min
Your Span Names Are an Undocumented API: Telemetry Contracts Between Agent Teams
Span names and attribute keys are an undocumented API with consumers across cost dashboards, eval pipelines, and SLO alerts. Treat telemetry like a versioned schema or keep getting paged for breaks that fail silently.
observabilityopentelemetry
Apr 2510 min
User-Side Concept Drift: When Your Prompt Held but Your Users Moved
A frozen golden set scores green while production satisfaction quietly collapses — because the users moved, not the model. Detection patterns and an eval-set shelf-life discipline.
llm-evalsdrift-detection
Apr 2510 min
Policy-as-Code for Agents: OPA, Rego, and the Decision Point Your Tool Loop Doesn't Have
System prompts and YAML manifests stop scaling once tool catalogs grow. A dedicated policy engine — OPA with Rego, or AWS Cedar — sitting in front of every tool call gives agents the auditable Policy Decision Point that prompt engineering can't provide.
securityai-agents
Apr 2412 min
Trace Sampling for Agents: Which of 10 Million Daily Spans Are Worth Keeping
Why uniform sampling defaults break for agent workloads, and the four decisions — start, end, stratification, retention — that determine your trace bill and your incident MTTR.
agentsobservability
Apr 2311 min
Accept Rate Is a Vanity Metric: Your Copilot ROI Hides in the 90 Seconds After the Keystroke
Copilot accept rates measure the frictionless half of the loop. The ROI lives in the validation tax that follows — here are the metrics that actually tell the truth.
ai-engineeringdeveloper-productivity
Apr 2211 min
You Accidentally Built a Feature-Flag System for Prompts — Without the Governance
Prompt config repos behave like feature-flag services at runtime but lack exposure tracking, audit logs, rollback telemetry, and per-user gating. Here is the governance gap and how to close it.
llmopsprompt-engineering
Apr 2210 min
Your Accuracy Went Up and Your Calibration Collapsed
A prompt refactor that lifts accuracy three points can quietly destroy the hedging signal users trust. Measure calibration with ECE, reliability diagrams, and refusal rates — not just accuracy.
calibrationevals
Apr 2210 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 54

The Local-Maximum Trap in Prompt Iteration: How to Tell You're Tweaking the Wrong Thing

Persona Drift: When Your Agent Forgets Who It's Supposed to Be

Sovereignty Collapse: Logging Where Your Prompt Actually Went

The Query Rewriting Layer Your RAG Pipeline Skipped

The AI Off-Switch That Doesn't Exist: Retiring Features After Users Co-Author the Archive

Your Span Names Are an Undocumented API: Telemetry Contracts Between Agent Teams

User-Side Concept Drift: When Your Prompt Held but Your Users Moved

Policy-as-Code for Agents: OPA, Rego, and the Decision Point Your Tool Loop Doesn't Have

Trace Sampling for Agents: Which of 10 Million Daily Spans Are Worth Keeping

Accept Rate Is a Vanity Metric: Your Copilot ROI Hides in the 90 Seconds After the Keystroke

You Accidentally Built a Feature-Flag System for Prompts — Without the Governance

Your Accuracy Went Up and Your Calibration Collapsed

About Tian Pan