Latency-Aware Tool Selection: When 'Good Enough Now' Beats 'Best Available Later'
The tool description in your agent's system prompt is a six-month-old eval artifact. It says search_pricing returns "fresh inventory data with structured pricing" and the planner believes it, because nothing in the prompt has updated since the day the description was tuned. The actual search_pricing endpoint has been sitting at p95 of 11 seconds for the last forty minutes because the upstream vendor is rate-limiting your account, and the cheaper search_cache tool — which the prompt describes as "may be slightly stale" — would return the same answer in 200ms. The planner picks search_pricing anyway, because the description still reads like it did during eval, and the planner has no signal about what either tool costs to call right now.
This is the structural failure of static tool descriptions. The planner is making routing decisions on a snapshot of a world that has moved on. Tool selection isn't really a capability question — most production agents have two or three tools that overlap heavily in what they can answer — it's a cost-of-waiting question, and the cost of waiting is the thing your prompt template doesn't see.
