Skip to main content

Context Poisoning in Long-Running AI Agents

· 9 min read
Tian Pan
Software Engineer

Your agent completes step three of a twelve-step workflow and confidently reports that the target API returned a 200 status. It didn't — that result was from step one, still sitting in the context window. By step nine, the agent has made four downstream calls based on a fact that was never true. The workflow "succeeds." No error is logged.

This is context poisoning: not a security attack, but a reliability failure mode where the agent's own accumulated context becomes a source of wrong information. As agents run longer, interact with more tools, and manage more state, the probability of this failure climbs sharply. And unlike crashes or exceptions, context poisoning is invisible to standard monitoring.

Why Context Degrades in Long-Running Sessions

The transformer attention mechanism has a dirty secret: it doesn't treat all tokens equally over long sequences. Research on frontier models shows a consistent U-shaped pattern — models attend well to tokens at the start and end of a context window, and poorly to content buried in the middle. Once a context exceeds roughly 50% of its window capacity, performance starts to degrade measurably, and this happens well before the stated maximum context length.

For agents, four compounding mechanisms turn this quirk into a reliability crisis:

Context poisoning occurs when a hallucinated fact gets embedded in the context and subsequently treated as ground truth. The agent asked a tool whether a file existed, misread the response, and wrote "file confirmed present" into its running notes. Every downstream step that touches that file now reasons on false premises.

Context distraction appears at high token counts. As the session history grows, models begin to pattern-match against past steps rather than synthesize new plans. An agent processing its hundredth tool call starts to look like its previous fifty, with the system reinforcing historical patterns instead of responding to current state.

Context confusion emerges from tool definition sprawl. Each tool definition added to the system prompt consumes tokens and introduces decision overhead. Studies show accuracy degrades measurably when agents have more than one tool, with the failure rate worsening on smaller models and growing non-linearly as the tool count rises past thirty.

Context clash is the most damaging. When an agent takes a wrong turn midway through a task, the incorrect reasoning remains in the context window and continues to influence all subsequent steps. Benchmark data shows a 39% average performance drop when earlier errors persist in context — the model doesn't detect the contradiction; it rationalizes around it.

The Cascading Failure Pattern

The dangerous property of context poisoning isn't the initial wrong fact. It's the downstream amplification.

An inventory agent hallucinates a nonexistent SKU. This wrong value gets passed to a pricing tool, which computes a cost. The cost gets passed to a shipping estimator. The shipping estimate triggers a customer notification. By the time a human sees an anomalous order, the original error has been laundered through four systems and presented with the full appearance of legitimate work. The agent didn't fail catastrophically — it succeeded at each step, operating on corrupted inputs.

This pattern is especially sharp in multi-agent pipelines. When Agent A's output feeds Agent B, which feeds Agent C, a single poisoned fact in step three of Agent A's workflow can produce a subtly wrong final output that looks completely plausible. The error doesn't trigger any exception handler because nothing technically malfunctioned.

The implication: standard error monitoring catches binary failures. It's entirely blind to the category of failures where the agent runs to completion and returns a wrong answer.

What Session Health Actually Looks Like

Most teams think of session health as "did the agent finish without crashing." Production experience reveals a different set of signals worth watching:

Tool output staleness: When an agent fetches a resource in step two and references it in step eleven, is it re-querying or relying on a nine-step-old value? Stale reads are especially dangerous in workflows that span minutes or hours, where external state can change underneath an in-flight agent.

Reasoning coherence drift: Intermediate reasoning steps that progressively contradict earlier steps are an early indicator. An agent that says "the file does not exist" in step four but "processing file contents" in step seven has experienced context corruption, not goal completion.

Tool call redundancy: Agents under context pressure frequently re-invoke tools they've already called, often arriving at slightly different results. A spike in duplicate tool calls signals the agent has lost track of what it already knows.

Instruction compliance decay: The initial system prompt is typically the most important input, but it becomes the least attended content as the session grows. Agents that initially follow formatting and constraint rules will gradually drift away from them as context fills. Compliance monitoring against fixed invariants is a cheap proxy for overall context health.

Checkpointing: Designing for Recovery, Not Just Success

The checkpoint-restore pattern, borrowed from distributed systems, is the most reliable structural defense against context poisoning. The idea: at defined milestones, persist enough agent state to restart from a known-good position rather than replaying from the beginning.

For agentic workflows, effective checkpointing involves three elements:

State snapshots at decision points: Before any irreversible action — sending a message, writing to a database, calling an external API — capture the full agent state. This isn't just context; it's the structured representation of what the agent believes to be true. If subsequent steps reveal corruption, you can rewind to the last clean snapshot.

Fact verification at boundaries: Rather than trusting intermediate context, have the agent re-query authoritative sources at phase boundaries. An agent that's been running for forty steps should re-verify the current state of key entities rather than relying on readings from step two. This is expensive, but less expensive than detecting corruption after an irreversible action.

Hybrid stateless/stateful checkpointing: For short interruptions (tool timeouts, transient errors), stateful restoration — resuming exactly where the agent left off — preserves execution fidelity. For major failures or suspected corruption, stateless restoration from a higher-level checkpoint is safer, even if it means some rework.

Selective Context Pruning

Checkpointing handles recovery. Selective pruning handles prevention.

The naive approach — keeping the full conversation history — guarantees eventual context rot. The better approach is active hygiene: removing or compressing stale content before it becomes a liability.

Summarize-then-prune: Before dropping content from the context window, compress it into a structured summary. After five conversation turns, condense steps one through three into a summary block. This preserves decision rationale without retaining the full token-expensive transcript. Done well, this reduces context size by 30–40% while preserving more useful signal than raw truncation.

Tool result externalization: Large tool outputs — API responses, database query results, document contents — don't belong in the context window beyond their immediate use. The memory pointer pattern stores them in external state and substitutes a short reference key. Downstream tools retrieve the full data on demand rather than operating on context-embedded copies that age and mislead.

Recency-weighted retention: Not all context ages equally. Newer tool outputs are more likely to reflect current state. A simple decay policy reduces the effective weight of older interactions, making recent inputs more influential in the agent's reasoning without discarding historical context entirely.

Periodic session resets: For very long-running agents, no pruning strategy fully eliminates accumulated noise. The most reliable intervention is a controlled reset: generate a structured summary of current state, start a fresh context with that summary as the system prompt, and continue from there. This is analogous to a connection pool recycle — not free, but prevents slow degradation from becoming catastrophic failure.

Invariant Checking: Grounding the Agent in Reality

Beyond context management, agents that handle high-stakes workflows benefit from explicit invariant checking. These are assertions about the state of the world that should be verifiable at any point in the workflow:

  • The file being processed still exists at the expected path
  • The API resource returned a 200 in the last N steps, not the first N steps
  • The accumulated cost of operations remains within the authorized budget
  • No contradictory facts appear in the structured state object

Invariant violations don't have to halt the workflow — they can trigger re-verification, escalate to a human checkpoint, or initiate a controlled rollback to the last valid snapshot. The critical property is that they're evaluated against external ground truth, not against the agent's own context.

The Chandy-Lamport distributed snapshot principle applies here: a consistent global state means all agents in a pipeline agree on the same version of shared facts. Running invariant checks at merge points in multi-agent workflows catches the divergence that context poisoning creates before it produces downstream damage.

Putting It Together: A Practical Architecture

An agent architecture resilient to context poisoning combines these layers:

First, keep the context lean. Externalize large tool outputs, prune content that's no longer action-relevant, and summarize older steps rather than accumulating them verbatim. A focused 300-token context outperforms a 100,000-token context full of noise — this has been measured, not theorized.

Second, checkpoint before consequences. Every action with external side effects should be preceded by a snapshot of agent state and a verification of the key facts underpinning that action. This is the agent equivalent of a database pre-commit check.

Third, monitor for health signals, not just completion. Track tool call redundancy, instruction compliance drift, and reasoning coherence across steps. These signals surface context corruption long before the workflow produces a wrong answer.

Fourth, design for resets. Very long sessions should have planned reset points where a fresh context is started from a structured state summary. This isn't a failure — it's deliberate lifecycle management.

Why This Doesn't Get Enough Attention

Context poisoning is underweighted as a production concern because demos don't surface it. A ten-step workflow in a showcase rarely runs long enough to accumulate corrupted state. A production agent that runs hundreds of steps daily, over weeks, in a changing environment — that agent will eventually hallucinate a fact, fail to detect it, and propagate it through the system.

The teams that build agents designed to run in the background, autonomously, for extended periods will encounter this. The ones who've designed for it — with checkpoints, pruning, invariant checks, and session health monitoring — will catch it early. The ones who haven't will discover it after a downstream system has acted on a fact that was never true.

Long-running agentic reliability is, ultimately, a context hygiene problem dressed up as a model problem. The model is working exactly as designed. It's reasoning faithfully over a context that no longer reflects reality.

References:Let's stay in touch and Follow me for more thoughts and updates