Blog

Page 36

12 articles

Conversation History Is a Trust Boundary, Not a Text Blob
Conversation history is a multi-source feed, not append-only state. Tag each turn's origin, anchor user turns with HMACs, and wrap tool output in trust zones — or your agent's attack surface grows linearly with every turn.
insiderai-security
May 1210 min
The Demo-to-Dogfood Gap: Why Your AI Feature Dies Between the Launch Slide and Monday Morning
Most enterprise AI pilots leave a great demo and a dead Slack channel. The dogfood phase is the cheapest production-grade eval you will ever run — here is what a real gate looks like and why the demo is not evidence of readiness.
insiderai-engineering
May 1211 min
The Embedding Migration Black Hole: How a Vector Model Bump Silently Rewrites Your Business Rules
An embedding model upgrade is sold as an infra swap but ships as a recalibration event. Here's the parallel system of thresholds, clusters, and gold labels you have to rebuild — and the migration plan that survives production.
embeddingsrag
May 1211 min
The Eval Backfill Tax: Why Every Model Capability Launch Costs More Than You Budgeted
New model capabilities introduce failure modes your historical eval suite was never designed to catch — and the work to backfill it is the unbudgeted critical path on every capability launch.
insiderevals
May 129 min
The Eval Bus Factor: When the Person Who Defined 'Correct' Walks Out the Door
Eval suites stay green long after the person who knew what they were testing has left. The damage is silent, the recovery is expensive, and the fix is organizational, not technical.
evalsai-engineering
May 1210 min
Eval Triage Queues: Why FIFO Misses the Failures That Matter
A FIFO queue of eval failures wastes the most expensive thing in the loop — reviewer time. Score failures by traffic, severity, and recency, batch by cluster, and protect an adversarial quota.
evalsllmops
May 1211 min
The MCP Capability Disclosure Tax: When Every Connected Server Bills Your Context Window
MCP tool definitions reload on every planning turn, quietly burning 15-66K tokens per call and degrading tool-selection accuracy as servers stack. Here's how to price the disclosure tax and contain it with progressive disclosure, per-server attribution, and stable schemas.
insidermcp
May 1211 min
When Your Forbidden List Becomes a Recipe: The Hidden Cost of Negative Examples in Prompts
Mature production prompts grow a list of don'ts that quietly works against itself — both leaking attack surface and increasing the rate of the very outputs it forbids.
prompt-engineeringllm-security
May 1210 min
The Off-Hours Cost Curve: Why Your AI Feature Spends Differently on Saturday Than on Tuesday
Weekly rolling cost averages hide a cohort-mix problem every AI feature has — and the off-hours users paying 3–5x cost per active user are a structural shape, not an edge case.
insiderai-engineering
May 1210 min
Per-Customer Cost Concentration: Why AI Cost Dashboards Hide the Power Law
Aggregated AI cost dashboards hide a power-law distribution where the top 1% of customers drive 30–50% of token spend. Build per-customer attribution, slope-based anomaly detection, and reservation-based budget enforcement before one runaway agent loop becomes a margin event.
insiderai-cost
May 1212 min
Per-Tenant Prompt Compilation: When Your System Prompt Becomes a Build Artifact
Multi-tenant AI teams accidentally become compiler engineers the moment per-tenant prompt variance lands — and the operational bill arrives at month six. A look at why prompts at scale are build targets, not config files.
insiderprompt-engineering
May 1210 min
Prompt Edits Without PRs: The Velocity Metric Your AI Team Is Failing
Behavior change in AI products no longer routes through PRs. The dashboards leadership trusts miss the dominant source of product change, and the misdiagnosis is reshaping how AI teams get measured.
ai-engineeringprompt-management
May 129 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 36

Conversation History Is a Trust Boundary, Not a Text Blob

The Demo-to-Dogfood Gap: Why Your AI Feature Dies Between the Launch Slide and Monday Morning

The Embedding Migration Black Hole: How a Vector Model Bump Silently Rewrites Your Business Rules

The Eval Backfill Tax: Why Every Model Capability Launch Costs More Than You Budgeted

The Eval Bus Factor: When the Person Who Defined 'Correct' Walks Out the Door

Eval Triage Queues: Why FIFO Misses the Failures That Matter

The MCP Capability Disclosure Tax: When Every Connected Server Bills Your Context Window

When Your Forbidden List Becomes a Recipe: The Hidden Cost of Negative Examples in Prompts

The Off-Hours Cost Curve: Why Your AI Feature Spends Differently on Saturday Than on Tuesday

Per-Customer Cost Concentration: Why AI Cost Dashboards Hide the Power Law

Per-Tenant Prompt Compilation: When Your System Prompt Becomes a Build Artifact

Prompt Edits Without PRs: The Velocity Metric Your AI Team Is Failing

About Tian Pan