Context Bloat: The AI Memory Leak You Cannot Grep For
A long-running agent session that opened with a 2K context is now paying for 40K tokens of mostly-dead state. The retrieval results from turn three, the directory listing the agent already navigated past, the JSON dump from a tool call whose answer was a single integer — all of it is still riding shotgun on every subsequent inference call, billed in full, dragging on attention. The pattern is structurally identical to a memory leak: unbounded growth of unreferenced data. But no profiler will surface it, because the leak does not live in process memory. It lives inside the conversation history, and most agent frameworks ship without a collector.
The cost shows up in two places at once. The token bill grows quadratically — a 20-step loop where each step contributes 1,000 tokens produces roughly 210,000 cumulative input tokens, not 20,000, because every prior turn is rebilled on every subsequent call. And the model itself starts to degrade: by 50K tokens of accumulated noise, even a model with a 1M-token window has already lost double-digit points of accuracy on the actual task. You are paying more, to think worse, about a problem the model was already past three turns ago.
