907 posts tagged with "insider"

System Prompt Sprawl: When Your AI Instructions Become a Source of Bugs

April 20, 2026 · 9 min read

Software Engineer

Most teams discover the system prompt sprawl problem the hard way. The AI feature launches, users find edge cases, and the fix is always the same: add another instruction. After six months you have a 4,000-token system prompt that nobody can fully hold in their head. The model starts doing things nobody intended — not because it's broken, but because the instructions you wrote contradict each other in subtle ways the model is quietly resolving on your behalf.

Sprawl isn't a catastrophic failure. That's what makes it dangerous. The model doesn't crash or throw an error when your instructions conflict. It makes a choice, usually fluently, usually plausibly, and usually in a way that's wrong just often enough to be a real support burden.

Temperature Governance in Multi-Agent Systems: Why Variance Is a First-Class Budget

April 20, 2026 · 11 min read

Tian Pan

Software Engineer

Most production multi-agent systems apply a single temperature value—copied from a tutorial, set once, never revisited—to every agent in the pipeline. The classifier, the generator, the verifier, and the formatter all run at 0.7 because that's what the README said. This is the equivalent of giving every database query the same timeout regardless of whether it's a point lookup or a full table scan. It feels fine until you start debugging failure modes that look like model errors but are actually sampling policy errors.

Temperature is not a global dial. It's a per-role policy decision, and getting it wrong creates distinct failure signatures depending on which direction you miss in.

The Token Economy of Multi-Turn Tool Use: Why Your Agent Costs 5x More Than You Think

April 20, 2026 · 10 min read

Tian Pan

Software Engineer

Every team that builds an AI agent does the same back-of-the-envelope math: take the expected number of tool calls, multiply by the per-call cost, add a small buffer. That estimate is wrong before it leaves the whiteboard — not by 10% or 20%, but by 5 to 30 times, depending on agent complexity. Forty percent of agentic AI pilots get cancelled before reaching production, and runaway inference costs are the single most common reason.

The problem is structural. Single-call cost estimates assume each inference is independent. In a multi-turn agent loop, they are not. Every tool call grows the context that every subsequent call must pay for. The result is a quadratic cost curve masquerading as a linear one, and engineers don't discover it until the bill arrives.

Tokenizer Blindspots That Break Production LLM Systems

April 20, 2026 · 10 min read

Tian Pan

Software Engineer

Most engineers who build on LLMs eventually learn the rough conversion: one token is about 0.75 English words, so a 4,000-token context window fits roughly 3,000 words. That number is fine for back-of-napkin estimates when your input is casual English prose. It is quietly wrong everywhere else — and "everywhere else" turns out to be most of the interesting production workloads.

Token miscalculations don't fail loudly. They show up as cost overruns that don't match any line item, as context windows that silently truncate the last few paragraphs of a document, or as multilingual pipelines that work fine in English testing and go 4x over budget the first week they hit real traffic. By the time you trace the issue back to tokenization, the damage is done.

Tool Output Compression: The Injection Decision That Shapes Context Quality

April 20, 2026 · 10 min read

Tian Pan

Software Engineer

Your agent calls a database tool. The query returns 8,000 tokens of raw JSON — nested objects, null fields, pagination metadata, and a timestamp on every row. Your agent needs three fields from that response. You just paid for 7,900 tokens of noise, and you injected all of them into context where they'll compete for attention against the actual task.

This is the tool output injection problem, and it's the most underrated architectural decision in agent design. Most teams discover it the hard way: the demo works, production degrades, and nobody can explain why the model started hedging answers it used to answer confidently.

What Your Vendor's Model Card Doesn't Tell You

April 20, 2026 · 10 min read

Tian Pan

Software Engineer

A model card will tell you that the model scores 88.7 on MMLU. It will not tell you that the model systematically attributes blame to whichever technology appears first in a list of possibilities, causing roughly 10% of its attribution answers to be semantically wrong even when factually correct. It will not tell you that adding "you are a helpful assistant" to your system prompt degrades performance on structured reasoning tasks compared to leaving the system prompt blank. It will not tell you that under load the 99th-percentile latency is 4x the median, or that the model's behavior on legal and financial queries changes measurably depending on whether you include a compliance disclaimer.

None of this is in the model card. You will learn it by shipping to production and watching things break.

Agent Protocol Fragmentation: Designing for A2A, MCP, and What Comes Next

April 19, 2026 · 9 min read

Tian Pan

Software Engineer

Most teams picking an agent protocol are actually making three separate decisions at once — and treating them as one is why so many integrations break the moment a second framework enters the picture.

The three decisions are: how your agent talks to tools and data (vertical integration), how your agent collaborates with other agents (horizontal coordination), and how your agent surfaces state to a human interface (interaction layer). Google's A2A, Anthropic's MCP, and OpenAPI-based REST solve for different layers of this stack. When engineers conflate them, they either over-engineer a single-agent setup with multi-agent machinery, or under-engineer a multi-agent workflow with single-agent tooling. Both failures are expensive to refactor once in production.

The Cascade Problem: Why Agent Side Effects Explode at Scale

April 19, 2026 · 12 min read

Tian Pan

Software Engineer

A team ships a document-processing agent. It works flawlessly in development: reads files, extracts data, writes results to a database, sends a confirmation webhook. They run 50 test cases. All pass.

Two weeks after deployment, with a hundred concurrent agent instances running, the database has 40,000 duplicate records, three downstream services have received thousands of spurious webhooks, and a shared configuration file has been half-overwritten by two agents that ran simultaneously.

The agent didn't break. The system broke because no individual agent test ever had to share the world with another agent.

The Agent Specification Gap: Why Your Agents Ignore What You Write

April 19, 2026 · 12 min read

Tian Pan

Software Engineer

You wrote a careful spec. You described the task, listed the constraints, and gave examples. The agent ran — and did something completely different from what you wanted.

This is the specification gap: the distance between the instructions you write and the task the agent interprets. It's not a model capability problem. It's a specification problem. Research on multi-agent system failures published in 2025 found that specification-related issues account for 41.77% of all failures, and that 79% of production breakdowns trace back to how tasks were specified, not to what models can do.

The majority of teams writing agent specs are committing the same category of mistake: writing instructions the way you'd write an email to a competent colleague, then expecting an autonomous system with no shared context to execute them correctly across thousands of runs.

AI Coding Agents on Legacy Codebases: Why They Fail Where You Need Them Most

April 19, 2026 · 9 min read

Tian Pan

Software Engineer

The teams that most urgently need AI coding help are usually not the ones building new greenfield services. They're the ones maintaining 500,000-line Rails monoliths from 2012, COBOL payment systems that have processed billions of transactions, or microservice meshes where the original architects left three acquisitions ago. These are the codebases where a single misplaced refactor can introduce a silent data corruption bug that surfaces three weeks later in production.

And this is exactly where current AI coding agents fail most spectacularly.

The frustrating part is that the failure mode is invisible until it isn't. The agent produces code that compiles, passes existing tests, and looks reasonable in review. The problem surfaces in staging, in the nightly batch job, or in the edge case that only one customer hits on a specific day of the month.

Why Users Ignore the AI Feature You Spent Three Months Building

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Your team spent three months integrating an LLM into your product. The model works. The latency is acceptable. The demo looks great. You ship. And then you watch the usage metrics flatline at 4%.

This is the typical arc. Most AI features fail not at the model level but at the adoption level. The underlying cause isn't technical — it's a cluster of product decisions that were made (or not made) around discoverability, trust, and habit formation. Understanding why adoption fails, and what to actually measure and change, separates teams that ship useful AI from teams that ship impressive demos.

When Your AI Feature Ages Out: Knowledge Cutoffs and Temporal Grounding in Production

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Your AI feature shipped in Q3. Evals looked good. Users were happy. Six months later, satisfaction scores have dropped 18 points, but your dashboards still show 99.9% uptime and sub-200ms latency. Nothing looks broken. Nothing is broken — in the traditional sense. The model is responding. The infrastructure is healthy. The feature is just quietly wrong.

This is what temporal decay looks like in production AI systems. It doesn't announce itself with errors. It accumulates as a gap between what the model knows and what the world has become — and by the time your support queue reflects it, the damage has been running for months.

About Tian Pan