Skip to main content

320 posts tagged with "ai-agents"

View all tags

Prompt Injection Is a Confused Deputy, Not a Content-Filtering Problem

· 10 min read
Tian Pan
Software Engineer

The most common post-incident finding for a prompt injection breach is some variation of "the model got tricked." A retrieved document contained hidden instructions, the agent followed them, customer data left the building. The fix that follows is almost always a content filter: scan the input, classify the malicious instruction, strip it out before it reaches the model. Ship the filter, close the ticket.

That finding is wrong, and the filter is a treadmill. "The model got tricked" describes the symptom, not the vulnerability. The vulnerability is that an agent holding real privileges — a database token, a send-email capability, filesystem write — accepted instructions from a source that should never have been allowed to command those privileges. That is not a new class of bug. It is a confused deputy, and operating systems named and largely solved it almost forty years ago.

If you treat prompt injection as a detection problem, you are signing up for an arms race against every attacker who can phrase a sentence. If you treat it as an authority problem, you get to reuse decades of security engineering that already works.

Your System Prompt Grows After Every Incident — and Nobody Deletes a Line

· 8 min read
Tian Pan
Software Engineer

Open the system prompt of any agent that has been in production for a year. Scroll to the bottom. You will find a sediment layer of sentences that read like apologies: "Never invent order numbers." "Do not promise refunds you cannot confirm." "If the user is in Germany, do not mention the legacy plan." Each one is a fossil. Each one marks the exact moment something went wrong in production, someone got paged, and the fastest available fix was to add a sentence.

Nobody deletes those sentences. Not because they are still earning their place, but because deleting one means proving a negative — proving the model will not regress on a bug that may have been fixed three model versions ago. No one can prove that, so the line stays. The system prompt becomes an append-only log of past incidents, and it costs you tokens on every single call, forever.

This is the quietest form of technical debt in an AI system, because it does not look like debt. It looks like diligence.

Task Completion Goes Green While Users Quietly Suffer

· 8 min read
Tian Pan
Software Engineer

Your agent dashboard says 94% task completion. Leadership is happy. The roadmap gets funded. And yet support tickets are climbing, power users have gone quiet, and the one engineer who actually watches traces keeps muttering that something is wrong. Both things are true at once. The agent is completing tasks. It is also taking twelve minutes and four thousand tokens to do a two-step job, backtracking three times, and asking the user to confirm a fact it could have inferred from the first message.

Task completion is a binary that hides a distribution. "The agent finished" tells you nothing about the path it took to finish, and the path is most of what users actually experience. A completion-rate dashboard is structurally incapable of seeing a slow, expensive, annoying agent. It will stay green right up until users churn.

This is not a measurement gap you can patch with a better prompt. It is a category error in what you chose to measure. Completion is the easiest thing to instrument and the least of what people are paying for.

The Agent That Remembers What You Took Back: Deletion as a First-Class Memory Operation

· 10 min read
Tian Pan
Software Engineer

In March, a user told your agent to stop recommending restaurants with outdoor seating — they had moved to an apartment with a baby and late nights were over. In September, the agent suggests a rooftop bar for their anniversary. The user is annoyed, and you are confused, because you watched the March correction land. It got written to memory. It is still there. The problem is that it is sitting next to the original preference, which is also still there, and retrieval surfaced the older one because it had a slightly better embedding match for "anniversary dinner."

This is the failure mode nobody designs for. Teams spend weeks on memory writes — extraction, summarization, embedding, namespacing — and treat deletes as a someday problem. Long-term memory makes adding a fact almost free, so facts accumulate. But a memory store is not a diary. A diary is allowed to contain things that used to be true. A memory store that an agent reads from to make decisions is not, because the agent cannot tell the difference between a fact and a fossil.

Token Budgets Are a Scheduling Problem, Not a Prompt Problem

· 9 min read
Tian Pan
Software Engineer

When an agent gives a worse answer than it did last week, the first instinct is to blame the prompt. Someone reworks the system instructions, trims a few sentences, adds an example, and ships. Sometimes it helps. Often it does nothing, because the prompt was never the problem. The problem is that a single verbose tool result quietly consumed 18,000 tokens, pushed the actual task instructions into the low-attention middle of the context window, and left the model reasoning over a transcript that is 70% noise.

That is not a wording problem. That is a resource-allocation problem. And resource allocation has a name in systems engineering: scheduling. The context window is a fixed-size resource, multiple consumers compete for it, and right now most agent stacks "schedule" it the way a 1960s batch system scheduled memory — first come, first served, until it runs out.

The Tool Default Argument Is a Policy Decision in Disguise

· 10 min read
Tian Pan
Software Engineer

Open the trace of any agent run and look at a tool call. You see the tool name and the arguments the model chose to pass. What you do not see is everything it did not pass. A search call with query set and nothing else still ran with a page size, a timeout, a result ranking, and a visibility scope. The agent decided none of those. You did, months ago, when you wrote the tool's schema and left those parameters optional with a default.

That default is not a convenience. It is a policy decision wearing the costume of a sensible blank. The default page size caps how much of the world the agent can see in one call. The default timeout decides when the agent gives up and improvises. The default visibility scope decides whether "search the docs" means the public handbook or the entire internal wiki including the unreleased roadmap. The default dry_run flag decides whether the agent's action is a rehearsal or a real, irreversible event in production.

The Tool Schema You Changed Without Telling the Agent

· 11 min read
Tian Pan
Software Engineer

A backend engineer renames a field. user_id becomes customer_id, because the team finally standardized on the word "customer" across services. They add one more argument, region, because billing now needs it. The change ships behind a normal pull request with two approvals. Every downstream service that calls the endpoint gets updated in the same release. The integration tests are green. By every measure a backend team uses, this is a routine, well-executed API change.

A week later, support tickets start climbing. The agent that places orders is occasionally placing them with no customer attached, or attaching them to the wrong region. Nobody changed the agent. Nobody changed the prompt. The model is the same version it was last month. And yet the agent is now wrong in a way it was not wrong before.

The cause is not a bug in the model and not a bug in the backend. It is that the tool schema has two consumers, and only one of them was in the room when the change was reviewed.

The Tool That Worked Until Two Agents Called It At Once

· 9 min read
Tian Pan
Software Engineer

A tool passes its tests. You called it from one agent, watched it read a record, transform it, write it back, and return a clean result. It did exactly that, every time, for weeks. Then you scaled the agent fleet from one worker to twelve, and a customer reported that their subscription got upgraded twice in the same minute. The tool did not change. The number of things calling it did.

This is the failure mode that single-agent testing cannot catch, because single-agent testing never produces the condition that triggers it. One caller is, by construction, a serial workload. Every concurrency assumption your tool quietly relies on — that nobody else is mid-write when it reads, that a counter it increments is its own, that the draft it is editing will still be there when it saves — holds trivially when there is exactly one caller. The tool is not correct. It is untested. Those are different things, and the difference stays invisible until a second agent shows up.

The Distributed Trace That Goes Dark at the Agent Handoff

· 11 min read
Tian Pan
Software Engineer

You open the trace for a failed run. The span tree is beautiful: the user request, the planner agent's reasoning, three tool calls, token counts, latencies, all of it nested cleanly. Then the planner hands off to a specialist agent — and the trace ends. Not with an error span. It just stops. The next thing you have is a separate, rootless trace from the specialist agent that begins mid-thought, with no parent, no inputs you can see, and no connection to the request that caused it.

The bug lives in that gap. It always does. The handoff is where one agent's assumptions meet another agent's interpretation, and it is the single place your trace cannot follow.

This is not a logging problem. Your agents are probably emitting spans correctly on both sides. The problem is that the trace context — the thread ID that stitches spans into one story — did not survive the jump from caller to callee. Every HTTP client and gRPC stub in your stack propagates that context for free. Your agent handoff does not, because nobody told it to.

Halted Is Not a Status: Why Agents Need a Typed Terminal-Reason Protocol

· 10 min read
Tian Pan
Software Engineer

Open the dashboard for an agent fleet and you will see a clean number: completion rate, 94%. Below it, a list of runs, each tagged with one of two states — running, or not running. The 6% that are "not running" all look identical. Some of them finished the task perfectly. Some of them hit a step limit two actions short of done. Some of them caught a tool error and gave up. Some of them decided the task was impossible — correctly. And some of them simply lost the thread and stopped emitting tokens.

Your monitoring cannot tell these apart. It knows the process is no longer running. It does not know why, and "why" is the only thing that matters when you are deciding whether to page someone.

The Undo Button Your Agent Assumes Exists

· 9 min read
Tian Pan
Software Engineer

Watch an agent reason through a multi-step task and you will notice something familiar: it plans the way you debug. Try an approach, look at the result, and if it is wrong, back out and try another. The agent talks about its plan as a tree of options it can explore, prune, and revisit. That mental model is correct inside a code sandbox, where every action has an implicit undo. It is dangerously wrong the moment the agent touches the world.

A sent email does not unsend. A charged card does not uncharge without a refund flow, a fee, and a customer who already saw the notification. A deleted row is gone unless someone wired up soft deletes. A posted Slack message has already been read. The agent's planning model has no native concept of the one-way door — the action that, once taken, removes the option of pretending it never happened.

This is not a model intelligence problem. A smarter model still does not know which of your tools is reversible, because reversibility is not a property of the action. It is a property of the system the action lands in. You have to tell it.

The Vector Index Has a Staleness SLO Nobody Set

· 10 min read
Tian Pan
Software Engineer

A user asks your agent what the current price tier is for an enterprise plan. The agent retrieves a chunk, reads it, and answers: "$2,000 per month." Confident, sourced, formatted nicely. The problem is that pricing changed four days ago. The number the agent quoted was true last week. The chunk it retrieved was embedded before the change, and the index has not caught up.

Nobody decided this would happen. There was no design review where someone said "the agent may answer from data up to four days old." There is just a re-indexing job that runs nightly, or weekly, and a content team that edits the source whenever they feel like it, and a gap between those two clocks that nobody measures. That gap is a service level objective. It exists whether or not you wrote it down. The only question is whether you set it on purpose or inherited it by accident.