The Context Window Is a Commons, and Every Team Is Grazing It
Open a production agent and count what is in the context window before the user has typed a single character. There is a system prompt the platform team owns. There are tool definitions — forty of them, maybe more — each carrying a name, a description, a JSON schema, field-level docs, and a handful of enums. There is a block of retrieved examples that the search team added because few-shot helped one eval. There are six lines of safety instructions from trust and safety, four lines of formatting rules from the design team, and a paragraph of domain glossary that someone added during an incident and nobody removed.
Add it up and the agent boots with 30,000 tokens of overhead. On a connected setup with three MCP servers, that number is routinely far worse — one widely cited measurement put three servers at 143,000 of a 200,000-token budget, 72% of the window consumed before the conversation starts. None of it is wrong. Every line was added by someone solving a real problem. And that is exactly why the context window is being destroyed.
The context window is a commons. It is a shared, finite, depletable resource that many independent teams draw from, and — like every commons — it has no allocator. Each team makes a decision that is locally rational and collectively ruinous. This is not a prompt-engineering problem. It is a governance problem, and it needs governance tooling, not a cleverer prompt.
A Commons Has No Allocator
The tragedy of the commons is not a story about bad actors. It is a story about good actors and missing structure. A shared pasture supports a fixed number of animals. Each herder gains the full benefit of adding one more animal, while the cost of overgrazing is spread across everyone. So every herder, acting reasonably, adds animals until the pasture collapses. Nobody intended the collapse. Nobody was greedy. The structure simply had no mechanism to price the shared cost back to the individual decision.
The agent context window has the identical shape. The window is finite — that is true even with million-token models, because context rot means effective capacity is far smaller than nominal capacity. The teams drawing from it are independent: the retrieval team, the tools team, the safety team, the team that owns the product surface, the platform team that owns the base prompt. Each gets the full benefit of what it adds — its eval score goes up, its feature works. And the cost is diffuse: slightly slower responses, slightly higher spend, slightly worse recall, spread across every other team's feature and invisible in any single team's dashboard.
There is no allocator. No component of the system says "the retrieval team gets 8,000 tokens, the tools team gets 12,000, and the rest is reserved for the conversation." So the window fills the way a pasture empties: one reasonable decision at a time.
Every Addition Is Locally Rational
The dangerous part is that you cannot stop this by telling people to be disciplined. They already are. Walk through the actual decisions.
The tools team integrates a new MCP server. It exposes 40 tools. Each tool definition costs somewhere between 550 and 1,400 tokens once you count the schema and field descriptions. That is potentially 50,000+ tokens. But the team is not being careless — those tools are genuinely useful, and the default MCP client behavior is to load every definition upfront. The team did not choose bloat; it chose a working integration, and bloat was the default.
The retrieval team adds three few-shot examples to the prompt because a benchmark improved four points. Three examples, maybe 1,200 tokens. Completely defensible — they have the eval delta to prove it.
The safety team adds a paragraph after a near-miss in production. Nobody is going to argue against the safety paragraph.
The product team adds a glossary so the agent stops confusing two internal terms. Reasonable.
Every one of these is correct in isolation. Each team measured a benefit and paid a cost it could see — a few thousand tokens — against a budget it assumed was effectively free. The flaw is not in any decision. It is that the sum of locally optimal decisions is globally pessimal, and no single team is positioned to see the sum. The retrieval team does not know the tools team just added 50,000 tokens last sprint. The system that should integrate those costs does not exist.
And the cost is not linear. Past a threshold — Chroma's research found degradation at every input-length increment, with the effect clearly present well before 50,000 tokens — adding context makes the agent worse, not just slower. The model's attention budget is finite. Information in the middle of a long context gets 15-40% lower recall than the same information at the edges. So the 50th tool definition does not just cost tokens; it pushes the user's actual question deeper into a region the model attends to poorly. The commons does not degrade gracefully. It degrades, then it collapses.
You Cannot Govern What You Cannot Attribute
The first move is not to cut anything. It is to make the invisible cost visible, attributed to the team that incurred it.
Right now, most teams cannot answer a basic question: how many tokens does your feature add to the context window? They know their eval score. They do not know their footprint. That asymmetry is the entire problem — benefits are measured per team, costs are pooled — and it is fixable with instrumentation.
Build a context manifest. Every block of standing context — the base prompt, each tool group, each retrieved-example set, each instruction paragraph — gets an owner and a token count, computed on every build. The output is a single table: section, owning team, token count, percentage of the standing budget. When the tools team adds an MCP server, the manifest shows tool definitions jumped from 18,000 to 68,000 tokens, and it shows whose line item moved.
This changes the conversation from "the agent feels slow" to "tool definitions are 34% of the window and grew 50,000 tokens this month, owned by the integrations team." That is a sentence someone can act on. It is the equivalent of cloud cost attribution: nobody optimizes spend until the bill is broken down by team. Until the context window has the same breakdown, every conversation about bloat is a conversation about a number nobody owns.
Track it over time, too. A single snapshot tells you today's state; a trend line tells you the rate of grazing. The number you actually want on a dashboard is standing-context tokens per week, by team.
A Budget With Quotas, Not a Free-for-All
Attribution tells you who is grazing. Quotas decide how much each is allowed to.
Once you have a manifest, the next step is to convert the implicit free-for-all into an explicit allocation. Decide what the standing context — everything before the user's first message — is allowed to cost. A useful default: standing context should leave at least 70-80% of the effective window free for conversation, retrieval at runtime, and the agent's own working tokens. Note "effective," not nominal. If context rot makes 200,000 nominal tokens behave like 80,000 useful ones, your standing budget is a fraction of 80,000, not of 200,000.
Then split that standing budget into quotas. The platform team gets N tokens for the base prompt. The tools team gets a quota for tool definitions. The retrieval team gets a quota for standing examples. The quotas are enforced in CI: a build that pushes a section over its line item fails, the same way a failing test fails the build.
A quota does not say no. It says trade. If the tools team needs 60,000 tokens and its quota is 40,000, it has three honest options. Drop low-value tools. Move to a model where tools are not all loaded upfront — tool-search approaches let the agent discover tools on demand instead of carrying all definitions, and code-execution patterns present tool surfaces as APIs the agent calls rather than schemas it must hold in context. Or make the case to the allocation owner for a larger quota, which means some other team's quota shrinks.
That last option is the point. A quota forces the trade-off into the open and gives it an owner. Without it, the trade-off still happens — it just happens silently, distributed across every user's degraded experience, decided by nobody.
The Context Audit: Every Clause Re-Justifies Its Seat
Quotas stop the window from growing without bound. They do not reclaim what is already there, and they do not catch the slow rot of context that was justified and no longer is. For that you need a recurring audit.
The principle: standing context is not a ledger you append to. It is a portfolio you periodically liquidate. Every clause, every example, every tool definition should have to re-justify its seat on a schedule — quarterly is reasonable.
A real audit asks specific questions of each block. Why was this added — is there a linked eval, incident, or ticket? The glossary paragraph added during an incident eighteen months ago: is the model still confused without it, or has a model upgrade made it redundant? The three few-shot examples: do they still move the benchmark, or did a newer model internalize the pattern? Each of the 40 tools: when was each one last actually called in production? A tool that has not been invoked in 90 days is not a capability — it is 1,000 tokens of tax on every request, degrading every other tool's recall.
The audit's default should be removal. Anything without a current, demonstrable justification comes out, and the eval suite catches it if removal hurt. This inverts the usual bias. Today, adding context requires a small argument ("it helped four points") and removing it requires a large one ("prove nothing breaks"). The audit flips that: keeping context requires the argument, and removal is the default. Run it as a regression test — re-run the full eval after each removal and you will usually find the agent is unchanged or slightly better, because you also removed noise the model was spending attention on.
Pair the audit with a simple cultural rule: any pull request that adds standing context links the eval or incident that justifies it, states the token cost from the manifest, and names the quota it draws against. Context becomes a reviewed resource, not a free one.
Treat It Like the Shared Resource It Is
The fix for a tragedy of the commons is never "ask people to be more careful." It is structure: make the shared cost visible to the individual decision, give the resource an allocator, and price every withdrawal. Context engineering at the organizational level is exactly this.
Three moves, in order. Attribution — a context manifest that breaks the window down by owning team and tracks it over time, so the bill is itemized. Quotas — an explicit standing-context budget against the effective window, split into per-team line items and enforced in CI, so growth forces a trade instead of a silent tax. Audits — a quarterly liquidation where every clause re-justifies its seat and removal is the default, so context that stopped earning its keep is reclaimed.
The teams adding context are not the problem; they are doing their jobs well. The missing piece is the allocator. Build it, and the context window stops being a pasture that every team grazes until it collapses, and starts being what it should have been: a budgeted resource, with an owner, that the user's actual question is the most important thing inside.
- https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- https://research.trychroma.com/context-rot
- https://www.anthropic.com/engineering/code-execution-with-mcp
- https://www.anthropic.com/engineering/advanced-tool-use
- https://redis.io/blog/context-rot/
- https://www.getmaxim.ai/articles/context-engineering-for-ai-agents-production-optimization-strategies/
