Skip to main content

The Agent That Could Not Say Wait

· 10 min read
Tian Pan
Software Engineer

Pick any production agent built in the last two years and inventory the things it can actually do on a given turn. The list is short: emit a tool call, return a final answer, or ask the user a clarifying question. That is the entire action vocabulary. Notice what is missing. There is no verb for "I would like more time before deciding." There is no verb for "I am uncertain enough that I want to pause and reconsider without committing." There is no verb for "I want to dwell on this for a moment before I do anything." The agent literally cannot say wait. The grammar does not contain the word.

This is not a polish problem. It is a structural one. The moment the agent's only outputs are actions, every internal state has to be expressed through an action. Hesitation becomes a redundant tool call. Doubt becomes a confident commitment. The team that designed only the action verbs has shipped an agent whose only language is doing, and then they wonder why it never seems to think.

The Speech Act Gap

Speech act theory has a useful distinction here. An utterance has illocutionary force — what the speaker is trying to do with it — and locutionary form — the surface shape of the words. Humans have a rich illocutionary vocabulary for non-action states: hedging, deferring, requesting time, signaling discomfort, registering an objection without acting on it. Most of human deliberation is conducted in these speech acts before any externally observable action occurs.

Now look at the agent protocol. The illocutionary forces it can express are bounded by the locutionary forms the harness exposes. There is call_tool(name, args) and there is return(final_answer) and, if you're lucky, there is ask_user(question). That is roughly three illocutionary forces: commit-to-action, commit-to-answer, request-information. The space of human-style "I want to suspend judgment for one more turn" has no expressible form. The agent has the cognitive state. It does not have the verb.

The consequence is that internal hesitation gets squeezed into the closest available verb. If the model is uncertain whether to act, it cannot say "let me wait one turn," so it does the next best thing the protocol allows: it calls a tool it doesn't really need, manufacturing an excuse to think during the tool's latency. Or it returns a final answer with more confidence than it has, because the alternative — calling something redundant — feels worse than committing. The action vocabulary shapes the cognition, not the other way around.

Why "ask_user" Is Not Enough

The standard response when teams notice this gap is to bolt on a clarification tool. ask_user, request_more_context, defer_decision. These help, but they are partial fixes at the wrong layer. They live in the action vocabulary alongside read_file and send_email. The agent has to commit to making them — and the moment a hesitation is expressed as a tool call, it has been promoted from internal state to external action, with all the consequences that entails.

Three things break when "I am uncertain" has to be expressed as a tool call.

First, the agent is biased against using it for the same reason it is biased against any tool call: the tool has a cost in tokens and latency, and a well-instructed model is encouraged to pick the most direct path to the goal. Asking for clarification is rarely the most direct path, especially when the prompt has framed the task as a thing to be completed. The model that should pause instead commits, because committing is cheaper.

Second, when the agent does call ask_user, the interaction surface treats it as an action with consequences — a user must respond, a turn must be consumed, a notification fires. That weight is appropriate for genuine blockers but heavy-handed for low-grade uncertainty. The agent that is 70% confident has no protocol-level way to express "I am 70%" without either acting at 100% or escalating at 0%. The space in between is unreachable.

Third, "ask_user" only lets the agent ask the user. It does not let the agent ask itself. There is no first-person primitive — no let_me_think_about_this — that the harness recognizes as a legitimate state distinct from doing nothing and from acting. The agent that wants to dwell either calls a tool or returns a hedged answer; it cannot just be.

How the Bias Shows Up in Production

Watch a competent agent fail on a task that called for patience. The failure mode is not random.

The agent will often call a read-only tool it does not actually need. It will list a directory it already listed, re-fetch a URL it already read, dump a schema it already has cached. This is not error. It is the model expressing "I would like another turn of thinking" through the only channel available — a benign tool call that buys it another transcript step without committing. You can spot the pattern in traces: a flat region of tool calls that produce no new information, followed by a decision. The redundant calls were the deliberation, externalized as actions because there was nowhere else for them to go.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates