The Hidden Token Tax: How Overhead Silently Drains Your LLM Context Window

April 11, 2026 · 8 min read

Software Engineer

Most teams know how many tokens their users send. Almost none know how many tokens they spend before a user says anything at all.

In a typical production LLM pipeline, system prompts, tool schemas, chat history, safety preambles, and RAG prologues silently consume 30–60% of your context window before the actual user query arrives. For agentic systems with dozens of registered tools, that overhead can hit 45% of a 128k window — roughly 55,000 tokens — on tool definitions that never get called.

This is the hidden token tax. It inflates costs, increases latency, and degrades output quality — yet it never shows up in any user-facing metric.

Anatomy of a Taxed Request

Consider what happens when a user sends "What meetings do I have today?" Here's what ships alongside that 8-token query:

System prompt (behavior rules, persona, guardrails): 1,500–3,000 tokens
Tool/function definitions (names, descriptions, parameter schemas): 5,000–55,000 tokens
Chat history (prior turns for conversational context): 2,000–10,000 tokens
RAG context (retrieved documents or knowledge base chunks): 1,000–5,000 tokens
Safety preambles and output format instructions: 500–1,000 tokens
The actual user message: 8 tokens

That's over 10,000 tokens of overhead for an 8-token question — and with large tool registries, easily 60,000+. Every token gets billed at your input rate and competes for the model's finite attention budget.

Multi-turn conversations compound the problem. A 20-turn conversation accumulates 5,000–10,000 tokens of history, yet only the last few turns typically matter. You pay for all of them on every single call — a tax that grows linearly with conversation length and never shrinks on its own.

Tool Schemas: The Biggest Silent Offender

Tool definitions are the single largest source of hidden overhead. Each definition carries a surprising cost:

Tool name: 5–10 tokens
Description: 50–150 tokens
Argument schema (types, required fields): 100–300 tokens
Field descriptions and constraints: 50–200 tokens
Few-shot examples for reliable invocation: 200–500 tokens

That totals 550–1,400 tokens per tool. Most teams never notice because their framework injects these definitions automatically — the tax hides behind the abstraction.

Real-world measurements from agents connecting to MCP servers reveal the scale of the problem:

GitHub (35 tools): ~26,000 tokens
Slack (11 tools): ~21,000 tokens
Observability tools: ~8,000 tokens

That's 45% of a 128k context window gone before the developer types a single character. Every token is billed whether or not any tool gets called — a simple "summarize this document" still pays the full tax.

Selection accuracy also degrades as the registry grows:

5–10 tools: over 90% selection accuracy
50+ tools: drops to around 49% — coin-flip territory

More tokens, worse results.

The Tax Multiplies Across Chained Calls

The token tax doesn't add — it multiplies. An agentic workflow chaining three LLM calls — intent classification, database query, response formatting — each carrying 20,000 tokens of overhead burns 60,000 tokens of structural cost for a 200-token answer. That's a 300:1 overhead-to-value ratio.

Agent loops hit even harder. An agent that takes 10 steps, each carrying the full system prompt and tool definitions, burns 200,000–500,000 tokens on overhead alone. At $3 per million input tokens, that's $0.60–$1.50 per task just for the tax — before counting the tokens doing useful work.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Hidden Token Tax: How Overhead Silently Drains Your LLM Context Window

Anatomy of a Taxed Request

Tool Schemas: The Biggest Silent Offender

The Tax Multiplies Across Chained Calls

Recommended Reading

About Tian Pan

Anatomy of a Taxed Request​

Tool Schemas: The Biggest Silent Offender​

The Tax Multiplies Across Chained Calls​

Recommended Reading

About Tian Pan

Anatomy of a Taxed Request

Tool Schemas: The Biggest Silent Offender

The Tax Multiplies Across Chained Calls