The Streaming Response That Committed Before the User Said Yes

May 23, 2026 · 12 min read

Software Engineer

The user is reading the agent's reasoning as it streams in. Around token 1200, the model decides to call send_email, then create_ticket, then kick_off_deploy. The user, watching the partial output and realizing the agent has misread the request, hits the stop button half a second too late. The email is already sent. The ticket is already filed. The deploy is already running. The stop button cancelled the next token, not the consequences of the last one.

The bug is not in the cancel handler. The bug is the assumption — borrowed from every other streaming UI on the team's roadmap — that an incrementally rendered output is an incrementally reversible one. Tool calls do not honor that contract. They are point-in-time commits that the streaming layer happily fires while the rest of the response is still being generated, and the cancel button has no way to chase them down the wire.

This is one of those failure modes that nobody owns because it lives in the seam between two teams that each shipped their half cleanly. The UX team shipped streaming because it tested better in user studies. The platform team shipped tool calls because the framework supports them. Neither team had a meeting where someone asked: what is "stop" supposed to mean when the response has already left the building?

Streaming UX inherits expectations that tool calls do not honor

The mental model the user brings to a streaming response is the mental model they have from every chat UI of the last three years: text appears, you read along, if you don't like where it's going you hit stop and the rest never appears. The implicit contract is that the part you have not seen yet has not happened yet. For pure text, this is true. The output is just bytes accumulating in a buffer, and cancelling the stream cancels the rest of the buffer.

Tool calls break that contract without redrawing the screen. From the model's perspective, emitting a tool_use block is structurally identical to emitting a sentence — same stream, same token-by-token delivery. From the system's perspective, the moment the orchestrator parses a completed tool-call block, it dispatches the call to the runtime, and the runtime does what runtimes do: it executes. The send-email tool sends an email. The create-ticket tool creates a ticket. There is no holding pen between the model deciding to call a tool and the side effect landing in the world.

Multiple agent frameworks have surfaced this gap as bugs that look small until you read them in context. The OpenAI Agents SDK has an open issue that RunResultStreamed.stream_events() cannot be cancelled with the standard asyncio primitives, which means timeouts don't actually stop the work. The Ruby LLM library has the same complaint filed against it: there is no mechanism to abort a streaming request, only to stop consuming the chunks while the upstream call keeps running and the bill keeps accruing. The OpenAI Python client has had a long-running thread about cancellation for streaming responses. These are not exotic edge cases. They are the default behavior.

So the user clicks stop. The frontend stops rendering. The connection to the model server may or may not actually close — that depends on the framework — but even if it does, the tool calls that already dispatched are running in their own processes, talking to their own APIs, on their own clocks. The cancel button is a sound effect for the parts of the system the user can see.

The impedance mismatch is a protocol gap, not a bug

It is tempting to file this as a cancel-handler bug and ask the platform team to make the stop button "really" stop. That framing is wrong, and it leads to fixes that don't work. The real issue is that the streaming response and the tool-call dispatcher are speaking two different protocols about what an "event" is, and the UI layer is pretending they are the same.

The streaming response protocol is best-effort and reversible. Each chunk is a hint about where the output is heading; the next chunk may contradict the last; the whole thing can be thrown away without consequence because it has not been rendered or persisted.

The tool-call protocol is exactly-once and committing. Each call is an authorized action against an external system that the agent has been told to perform, and once it leaves the orchestrator, the external system owns the outcome. There is no chunk-level rollback for "we sent an email," and there is no provider on earth that supports "actually scratch that, the user changed their mind, please uninvoice the customer."

The mismatch is structural. As long as the model is allowed to emit tool calls inside the same stream as the prose, and the dispatcher is allowed to fire those calls eagerly, the cancel button cannot mean what users think it means. Recent design-pattern roundups for agentic UX make this explicit: if the AI is going to do something irreversible, you don't let it do it without a preview, and the undo for reversible things needs to be obvious enough that users don't have to discover it under stress.

The honest version of the fix is to stop pretending the protocols are interchangeable and to put a buffering boundary between them.

Speculative tool-call mode as the default for blast-radius-bearing actions

The pattern that actually fixes the failure mode is to mark tool calls as either eager or speculative at the framework layer, and to default the speculative mode to "buffer until the stream completes cleanly." Eager tools — lookup_record, search_docs, anything read-only and idempotent — can fire mid-stream because nothing bad happens if the stream gets cancelled afterward; the worst case is a wasted read and some latency. Speculative tools — send_email, charge_card, delete_row, post_message, anything that touches the world — are recorded into a buffer that the dispatcher only flushes once the stream terminates with a clean stop reason and (depending on policy) an explicit user acknowledgment.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Streaming Response That Committed Before the User Said Yes

Streaming UX inherits expectations that tool calls do not honor

The impedance mismatch is a protocol gap, not a bug

Speculative tool-call mode as the default for blast-radius-bearing actions

Recommended Reading

About Tian Pan

Streaming UX inherits expectations that tool calls do not honor​

The impedance mismatch is a protocol gap, not a bug​

Speculative tool-call mode as the default for blast-radius-bearing actions​

Recommended Reading

About Tian Pan

Streaming UX inherits expectations that tool calls do not honor

The impedance mismatch is a protocol gap, not a bug

Speculative tool-call mode as the default for blast-radius-bearing actions