Persona Drift in Long-Running Agent Sessions: Why Your Agent Forgets Who It Is
Most production agent failures look like model errors. The agent starts a session responding correctly to the system prompt — maintaining the right tone, respecting tool constraints, following the defined workflow. Then somewhere around turn 30 or 40, things subtly shift. The agent starts hedging where it should be direct. It calls tools it was told to avoid. It contradicts a decision it made 15 turns earlier. The system prompt hasn't changed, but the agent's behavior has.
This is persona drift: the progressive divergence between an agent's actual behavior and its original system instructions, caused by how transformers attend to increasingly buried context. Research quantifies it precisely — after 8–12 dialogue turns, persona self-consistency metrics degrade by more than 30%. Single-turn agents achieve roughly 90% task accuracy; multi-turn agents running the same tasks fall to around 65%. That 25-point drop isn't a model quality problem you can prompt your way around. It's an architectural property of how attention works over long sequences, and most teams discover it only after they've shipped a feature that degrades silently for hours before a user finally notices.
