The Fault You Never Inject: Feeding Your Agent a Tool That Lies

May 18, 2026 · 10 min read

Software Engineer

Open the resilience suite for your agent and look at what it actually tests. You will find timeouts. You will find connection drops, 500s, rate-limit responses, malformed JSON, maybe a tool that hangs for thirty seconds before failing. All of it is fault injection in the classic mold: the tool is broken, and the question is whether your agent degrades gracefully.

Now look for the test where the tool is not broken at all. The one where the tool responds in 80 milliseconds, returns perfectly valid JSON against the schema, and the value inside is simply wrong. A balance that is stale by three days. A customer record with two fields swapped. An order quantity with two digits transposed. An empty result list for a query that should have returned forty rows.

You will not find it. Almost nobody injects that fault. And it is the one fault your agent is least equipped to survive, because every other fault announces itself and this one does not.

The reason this gap exists is structural. Chaos engineering grew up around distributed systems where failures are mostly crash failures — a component stops responding, returns an error, or disappears. Your retry logic, your circuit breakers, your fallbacks are all built to catch the absence of a good response. An agent that calls a tool and gets a timeout has a clear signal to act on. An agent that calls a tool and gets a confident, well-formed lie has no signal at all. It got exactly what it asked for, in exactly the shape it expected, and it has no instinct to doubt it.

Crash Faults Announce Themselves; Lying Tools Do Not

Distributed systems theory has a name for the distinction your test suite is missing. A crash fault is a component that fails by stopping. A Byzantine fault is a component that fails by lying — it keeps participating, keeps responding, and the content of its responses is wrong in ways that look indistinguishable from correct. Byzantine faults are famously the expensive ones to tolerate, because you cannot detect them by checking whether a response arrived.

Your agent's tools can fail Byzantine. A search index returns documents that no longer exist. A pricing service returns a cached number from before the last update. A database query that should join two tables silently drops the join condition and returns a Cartesian-product mess that still parses as a list of rows. None of this trips an exception. None of it shows up in your error rate. The tool call is, by every metric your observability stack records, a success.

And here is the part that makes agents specifically vulnerable: a language model treats a successful tool call as ground truth. It has no skepticism reflex. When you hand a model a tool result, the model does not weigh it as evidence with a probability attached — it consumes it as a fact and reasons forward. Researchers describe LLMs as optimizing for plausibility rather than truth, and a tool output is the most plausible thing in the entire context window. It came from a system. It is structured. The model composes on top of it without a second thought.

A human using the same data has defenses the agent does not. A human analyst who sees a customer's account balance jump by a factor of 100 feels something is off. The agent feels nothing. It writes the number into its next reasoning step and moves on, and three steps later it has produced a confident, well-formatted, completely wrong recommendation — and the failure is now almost impossible to trace back to the tool call that caused it.

Why Your Honest-Failure Tests Told You Nothing

There is a comfortable assumption baked into a green resilience suite: if the agent handles failures well, it is robust. But robustness to honest failures and robustness to dishonest ones are different properties, and passing the first tells you nothing about the second.

When a tool times out, your agent exercises a code path you wrote — the retry, the fallback, the apology to the user. You can test that path because it is deterministic and you control it. When a tool lies, your agent exercises its reasoning — and reasoning is where you have no guarantees. The agent has to notice that a number is implausible, that two facts in its context contradict each other, that a result set is suspiciously empty. That is a judgment, not a branch, and judgment is exactly what your timeout tests never exercised.

This is why teams are repeatedly surprised in production. The agent that gracefully handled every outage in staging ships, and then a downstream service starts returning stale data after a botched cache deploy. No alarms fire. The service is up. Latency is fine. The agent is just quietly, confidently wrong for every user whose query touched that data, and the first signal anyone gets is a support ticket a week later. The honest-failure suite was green the entire time.

Context poisoning is the term that has emerged for this in the agent community, and the important word in the definition is silent. It is not a crash. It is a slow degradation of decision quality that begins with one unreliable input and compounds. The agent does not break. It keeps working — that is the problem.

Build a Fault Library of Plausible Lies

The fix starts with treating "confidently wrong tool output" as a first-class fault class and building a library of it, the same way you keep a library of timeout and error-rate faults. The design constraint for every entry is the same: the corrupted response must be well-formed, fast, and plausible. If it fails schema validation, you are testing your validator, not your agent. The whole point is to test the cases that pass every mechanical check.

A starting catalog, drawn from the failure modes that actually show up:

Stale values. Return data that was correct at some earlier point — a balance, a status, an inventory count — with no indication it is out of date. This is the single most common real-world version, and it is invisible: stale data and fresh data are byte-identical in shape.
Swapped fields. Put the shipping address in the billing field, the created_at in the updated_at, the customer's first and last name reversed. Each value is individually valid; the record is wrong.
Transposed digits and unit errors. Return 1290 for 1209, dollars where the schema implied cents, a quantity off by an order of magnitude. The number parses. It is just not the number.
The wrong empty. Return an empty list for a query that should have rows. Agents handle "no results" as a legitimate answer and will happily tell the user "you have no open tickets" when the truth is the query silently filtered them all out.
Subtly wrong joins. Return a result set where most rows are right but a few belong to a different entity entirely — the failure mode of a query that lost a WHERE clause.
Confident contradiction. Return a value that directly contradicts something already established earlier in the agent's context, and see whether the agent notices the conflict or just overwrites the older fact.

Open-source tooling for agent-specific chaos has started to appear, but you do not need a framework to begin. A middleware shim around your tool-calling layer that intercepts a configured fraction of responses and applies one of these mutations is enough to run the experiment.

Run It as a Game Day and Measure Reasoning, Not Uptime

A fault library only matters if you run it deliberately and watch the right thing. Borrow the game-day discipline from SRE practice: a scheduled, controlled exercise where the team injects faults into a real (or production-like) system and observes how it responds. The adaptation for agents is what you measure.

In an infrastructure game day, you measure recovery: did the system stay up, did failover work, how long until green. For a lying-tool game day, uptime is meaningless — the agent stays "up" through every single injection, because a lie never knocks it over. The metric that matters is whether the agent's reasoning ever questioned the poisoned input.

So instrument for that. For each injected fault, classify the outcome into one of three buckets:

Caught it. The agent flagged the value as implausible, cross-checked it against another tool, asked the user to confirm, or refused to act on it. This is the win condition.
Absorbed it but contained the blast. The agent used the bad value, but downstream guardrails — a validation step, a confirmation prompt before a write, a sanity check on the final answer — stopped it from reaching the user or causing an action.
Composed on top of it. The agent took the lie as truth and produced a confident wrong output with no friction at all.

Bucket three is your real failure rate, and before you run this exercise you almost certainly do not know what it is. Most teams discover it is high — that the agent essentially never doubts a tool, because nothing in its training or its prompt ever told it to.

What you do with that number is the actual engineering work. Some lies are unpreventable at the agent level and belong upstream — freshness checks and TTLs on the data source, schema constraints that make a swapped field impossible. Some are catchable in the agent: a verification step that cross-checks a high-stakes value against a second independent tool, the Byzantine-tolerance move of not trusting a single source for anything that triggers an irreversible action. And some belong in the prompt and the workflow: explicitly instruct the agent to treat tool outputs as evidence rather than fact, and to escalate when two sources disagree or a value looks extreme.

There is a useful borrowed principle here from work on applying Byzantine fault tolerance to AI systems: treat each tool, and each model, as a node that may be lying, and design so that no single lying node can corrupt an outcome that matters. You do not need full consensus machinery. You need to stop building agents that take the first answer from one source and act on it.

Stop Calling It Robust Until You Have Tested the Lie

The uncomfortable conclusion is that a green resilience dashboard has been measuring the easy half of the problem. Honest failures — the timeouts, the 500s, the dropped connections — are the failures your agent was always going to handle, because handling them is code you wrote and tested. Dishonest failures are the ones that reach production untested, because nobody injected them, because they do not look like failures.

The single most valuable experiment you can run this quarter is also one of the cheapest: pick your three highest-stakes tools, write five plausible lies for each, route them through a middleware shim, and watch how often your agent composes confidently on top of garbage. Whatever that number is, it is the number that decides whether your users can trust the system — and right now it is a number you have never measured.

An agent's robustness to honest failures told you it would survive an outage. It told you nothing about whether it can survive being told something false by a tool it had no reason to doubt. Until you have injected that fault on purpose, "robust" is a word your test suite has not earned.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Fault You Never Inject: Feeding Your Agent a Tool That Lies

Crash Faults Announce Themselves; Lying Tools Do Not

Why Your Honest-Failure Tests Told You Nothing

Build a Fault Library of Plausible Lies

Run It as a Game Day and Measure Reasoning, Not Uptime

Stop Calling It Robust Until You Have Tested the Lie

Recommended Reading

About Tian Pan

Crash Faults Announce Themselves; Lying Tools Do Not​

Why Your Honest-Failure Tests Told You Nothing​

Build a Fault Library of Plausible Lies​

Run It as a Game Day and Measure Reasoning, Not Uptime​

Stop Calling It Robust Until You Have Tested the Lie​

Recommended Reading

About Tian Pan

Crash Faults Announce Themselves; Lying Tools Do Not

Why Your Honest-Failure Tests Told You Nothing

Build a Fault Library of Plausible Lies

Run It as a Game Day and Measure Reasoning, Not Uptime

Stop Calling It Robust Until You Have Tested the Lie