Skip to main content

The Agent Wall-Clock Budget That Raced Your Tool's Own Timeout

· 11 min read
Tian Pan
Software Engineer

There is a class of agent bug that does not appear in any single component when you look at it in isolation. The model is fine. The tool is fine. The retry policy is fine. The timeout values are even, on paper, generous. And yet a tool that consistently completes in eight seconds keeps landing against an agent that has already declared it a failure at seven point nine, replanned around an "error" that never happened, and started a second call that the first call's result is about to collide with.

The bug is not in any of the boxes. It is in the gap between two clocks that nobody agreed should be the same clock.

This is the agent-engineering equivalent of a distributed systems classic — clock drift between cooperating nodes — except the two nodes are inside the same process, and the drift is not measured in milliseconds of NTP skew. It is measured in whichever event each side decided to call "t-zero." The agent's budget tends to start ticking from the LLM's first emitted token. The tool's budget tends to start ticking from the moment the tool process actually receives the call. Between those two moments sits the entire prefill stage, the entire streaming pipeline, the entire tool-router hop, and any queueing in front of the tool worker. None of that time is shared between the two stopwatches. Both of them think they are timing the same thing.

The Two Clocks Nobody Agreed Were the Same Clock

The orthodox view of an agent step looks like a single bar on a timeline: prompt goes in, model thinks, tool runs, result comes back. The orthodox view of a timeout is that you draw a vertical line somewhere on that bar and call it the deadline.

The reality has at least four clocks running concurrently and almost never synchronized:

  • The agent harness clock, which usually starts the budget when the request is dispatched to the model.
  • The model's effective clock, which the harness frequently learns about only at the first streamed token (TTFT). Time-to-first-token in 2026 is reported in the hundreds-of-milliseconds-to-low-seconds range for chatty completions and can be much longer for long contexts. That whole interval may or may not count against your "agent budget" depending on which library you used.
  • The tool client clock, which starts when the harness emits the tool call and stops when the tool returns. Most frameworks expose this as a per-tool timeout.
  • The tool server clock, which only starts when the tool process actually receives the request. Anything sitting in front of it — a queue, an MCP router, a reverse proxy with its own idle timeout — adds latency the tool process cannot see and cannot bill against itself.

In a well-behaved RPC stack, these clocks are reconciled by deadline propagation. The caller computes an absolute deadline, attaches it to the request as a wall-clock instant (not a duration), and every downstream hop inherits it. The classic gRPC formulation is explicit about this: a deadline is "5 seconds from now" expressed as an absolute timestamp, and the context is propagated through every child call so that the whole subtree dies at the same moment. Frameworks like userver describe the same idea — chain the deadline so a slow upstream cannot consume the budget the downstream still expected to have.

Agent stacks, in practice, do not do this. The agent's budget is a duration counted in the harness process. The tool's budget is a separate duration counted in the tool process. There is no shared deadline. There is no propagation.

So when the model's first token finally arrives, the harness clock is already eleven hundred milliseconds in. When the tool call routes through the MCP server, three hundred more milliseconds vanish. When the tool worker dequeues the request and starts its own eight-second budget, the harness has been counting against an eight-second budget that started 1.4 seconds ago. From the harness's perspective, the tool has 6.6 seconds. From the tool's perspective, it has 8. The numbers look equal. They are not.

Why This Looks Like A Successful Tool Call That The Agent Refuses To Use

The pathological case is not when the agent gives up before the tool starts. That is loud — you see a cancelled call and an angry log. The pathological case is when the agent gives up at 7.9 seconds, the tool finishes at 8.0 seconds, and both sides write structured success logs to their respective traces.

The agent's trace shows: started call, waited, timeout expired at 7.9s, replanned. From the agent's point of view, the tool failed. It enters a recovery branch — picks a different tool, asks the user a clarifying question, or, worst of all, retries the same call.

The tool's trace shows: received call, executed, returned at 8.0s with a clean 200. From the tool's point of view, everything worked. It even billed the user for the work.

The race shows up downstream as three distinct symptoms, all of which look like different bugs:

  1. A "successful" tool result that nothing is waiting for. The agent's coroutine has been cancelled. The result lands in a closed channel or an orphan future. Some harnesses log this as "tool_use without matching tool_result" — a shape that should be impossible if both sides agree on what happened.
  2. A model that confidently reports "the tool failed, I'll try a different approach" while a perfectly correct answer is sitting in the tool's response queue.
  3. A duplicate call, because the agent's replan picked the same tool with the same arguments — and now the tool worker is doing the same work twice, with the second invocation racing the first stale result back into a state machine that does not know which one to trust.
Loading…
References:Let's stay in touch and Follow me for more thoughts and updates