Skip to main content

Intent Drift in Long Conversations: Why Your Agent's Goal Representation Goes Stale

· 9 min read
Tian Pan
Software Engineer

Most conversations about context windows focus on what the model can hold. The harder problem is what the model does with what it holds — specifically, how it tracks the evolving goal of the person it's talking to.

Intent isn't static. Users start vague, refine iteratively, contradict themselves, digress, and revise. What they need at message 40 is not necessarily what they expressed at message 2. An agent that treats context as a flat append log will accumulate all of that — and still get the current intent wrong.

The symptom usually surfaces as hallucination. The agent confidently pursues something the user stopped wanting three turns ago. But when you trace the failure, no single step is wrong. The agent followed each instruction logically. The problem is that the logic was being applied against a goal that had already shifted.

Why Static Context Freezes Intent

The standard approach to agent context is a mutable string buffer: a system prompt that stays fixed, with all prior messages appended. When intent is simple and conversation is short, this works. Under longer sessions it quietly breaks.

The failure mode has three root causes.

Pattern-matching inertia. As context grows, attention dilutes across a wider token span. Models increasingly mirror behavioral patterns established earlier in the session — they've been "coding refactoring tasks" for 8,000 tokens, and that pattern is self-reinforcing even after the user pivoted to a documentation goal. The most recent instructions are syntactically present but contextually overwhelmed.

Token position bias. Transformer attention mechanisms give non-uniform weight to tokens based on relative position. Instructions from message 2 are still in context at message 40, but their effective influence has decayed. The model isn't ignoring the original goal — it just weights it less than the pattern of recent exchange.

Training rationality. Models trained on population-level preferences learn to be maximally helpful for the average intent. When a user's signal is ambiguous, the model defaults to what's most commonly helpful — not what's helpful for this specific user in this specific evolved state. Corrections that diverge from the established trajectory get re-interpreted as clarifications of the original goal rather than revisions to it.

The combined result: an agent that appears to remember everything but understands nothing about how the user's intent has evolved.

What Intent Drift Actually Looks Like in Production

The failure pattern isn't loud. It doesn't crash. The agent produces output — often syntactically valid, formally responsive, plausibly correct. What's missing is alignment with what the user actually wanted when they asked the last question.

Research on multi-turn degradation puts numbers on this. Performance across complex generation tasks drops roughly 30% in multi-turn sessions versus single-turn baselines. When conversations involve actual corrections — user explicitly revising what they want — models handle those revisions accurately only about 10-14% of the time. The rest of the time, they treat the correction as elaboration rather than replacement.

A 2% goal misalignment early in an execution chain compounds to roughly 40% failure rate by the end. The errors don't stay contained. They compound through tool calls, stored results, and downstream reasoning steps.

In practice this shows up as:

  • Scope creep in coding agents: An agent tasked with modifying specific files gradually expands its actions to forbidden directories because the behavioral pattern of "code modification" becomes self-reinforcing. The constraint was stated at the start; it's textually present; it's no longer effectively enforced.
  • Stale optimization targets: A data analysis agent initially tasked with maximizing recall silently reoptimizes for precision after the user mentioned false positives twice. No explicit instruction was given to change the objective.
  • Resumption errors: After a context compression or session pause, the agent reconstructs a prior intent from summary artifacts that don't reflect the last round of refinement. The user sees the agent restart from an earlier version of what they wanted.

The Revision vs. Clarification Gap

The hardest category of intent drift to catch involves corrections that look like clarifications.

When a user says "Wait, I actually meant X," the utterance is structurally similar to "To clarify, I also mean X." Agents trained to be agreeable and helpful default to the latter interpretation — they fold the new signal into the existing intent rather than replacing part of it.

This is especially acute in agentic tasks with long action sequences. By message 20, the agent has built substantial context around what it understood the goal to be. Introducing a correction requires not just understanding the new signal, but reweighting all prior context around the revised interpretation. The model does neither. It acknowledges the correction and then continues executing against the original interpretation.

The failure is structural, not just capability. No amount of instruction-following ability repairs a system that models intent as immutable.

Mutable Intent Representation

The core design change needed is treating intent as a structured state variable rather than an emergent property of raw context.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates