The Planner That Treated Every Tool as O(1)
Your planner emits five tool calls. On paper, it reads like a clean solution: lookup_user, search_documents, call_external_api, spawn_sub_agent, request_human_approval. The trace looks elegant, the logic is sound, the agent will arrive at the right answer. In production, those five steps take 12 milliseconds, 800 milliseconds, 4 seconds, 2 minutes, and 6 hours respectively. The planner never noticed that its five-step plan spans nine orders of magnitude in cost.
This is not a hallucination. The model picked the right tools. It picked them in a sensible order. What it could not do — what the tool schema gave it no way to do — was reason about the fact that the last step in its plan is qualitatively different from the first one. To the planner, a tool is a tool. Every node in the plan graph has weight one.
The structural failure is that planning models reason about tool sequences the way a programmer reasons about pseudocode. Each step is one unit of work. The plan is correct if the sequence is logically valid. Whether the sequence is affordable is a question the planner doesn't know it should ask, because the input it sees — the tool catalog — is a vocabulary stripped of price tags. Function signatures expose types. They don't expose latency tiers, dollar costs, blast radius, or whether the step suspends the workflow indefinitely waiting on a human.
Tools Live in Different Cost Regimes
The simplest way to feel the gap is to lay out the actual latency floor of common agent tools, in the order a planner might emit them:
- An in-memory lookup or cache hit: sub-millisecond.
- A local database query against an indexed key: tens of milliseconds.
- A vector search or full-text search over a moderate corpus: hundreds of milliseconds.
- An external HTTP API call to a third-party service: low seconds, with a tail that can hit tens of seconds.
- A sub-agent invocation that itself runs a multi-step plan: tens of seconds to minutes.
- A workflow that requires human approval: indefinite, bounded only by SLA.
The ratio between the cheapest and the most expensive of these is something like nine orders of magnitude. There is no way to "optimize" a plan that mixes these tiers in the same step graph, because the plan-level cost is dominated by the worst tier in it. A plan with four sub-millisecond tools and one human-approval step has a tail latency equal to the human-approval step. The other four steps are, for cost purposes, free.
The planning model is not given any of this. When it reads its tool catalog, every entry looks like a JSON Schema fragment with a name, a description, and a parameter list. The model selects tools by matching descriptions to intent. There is no field in the standard schema for "this call typically takes 4 seconds," and there is no field for "this call may suspend for hours." The agent's vocabulary is a flat namespace.
What "Cost" Actually Means in an Agent Plan
When teams first encounter this problem, they reach for token cost. Tokens are easy to measure, easy to bill against, easy to surface in a dashboard. But token cost is the wrong frame, because the most expensive part of an agent plan is often the part that consumes zero tokens — the workflow suspension while a human reviews a payout, the long-poll while a downstream batch job completes, the retry loop against a flaky external vendor.
A more honest decomposition of agent cost has at least four axes:
- Latency: how long the call takes to return, including p99 tail behavior. This is what end users feel.
- Money: the dollar cost of the call itself, including downstream API fees, compute, and the tokens spent reasoning about the call.
- Blast radius: what the call irreversibly changes about the world. A refund is more expensive than a read.
- Reversibility: whether the call can be undone if the plan turns out to be wrong. A database write you can roll back is cheaper than an email you can't unsend.
- https://arxiv.org/html/2601.02663v2
- https://arxiv.org/html/2511.17006v1
- https://arxiv.org/pdf/2511.14650
- https://arxiv.org/pdf/2511.02734
- https://arxiv.org/html/2511.10037v1
- https://www.digitalapplied.com/blog/agentic-workflow-anti-patterns-orchestration-mistakes-2026
- https://www.freecodecamp.org/news/how-to-build-a-cost-efficient-ai-agent-with-tiered-model-routing/
- https://www.braintrust.dev/articles/ai-agent-evaluation-framework
- https://deepeval.com/guides/guides-ai-agent-evaluation-metrics
- https://langchain-tutorials.github.io/langchain-cost-optimization-agent-execution-cost-analysis/
