Skip to main content

The Streaming Abort That Left the Side Effect Billable

· 11 min read
Tian Pan
Software Engineer

A user is watching your agent stream a response. Two hundred milliseconds in, they hit stop. The UI clears the bubble, the spinner disappears, and the product behaves as if the request never happened. It did happen. The agent already called send_invoice_email. The vendor's mail relay returned 250 OK. The customer received a draft invoice the user never approved. Your billing meter charged the user for the tokens that streamed before the abort. It cannot bill back the email.

This is the failure mode every team with streaming tool use ships at least once, and most teams never even detect. The stream layer reports cancelled. The tool layer reports succeeded. Your customer-facing log picks one of them based on whichever subsystem flushes last, and the two halves of the same request now disagree about whether it occurred.

The temptation is to treat client-side abort as a UI concern — clear the bubble, drop the connection, move on. But a stream that has called any side-effecting tool is no longer a request that can be silently dropped. It is a partially committed transaction, and the team that did not design for partial commits is shipping a UX whose visible state and actual state drift on every cancellation.

The abort is a partial-commit event, not a cancel

Most engineers reach for the word "cancel" because the browser API is called AbortController and the HTTP layer reports the connection as closed. That vocabulary lies. A cancel implies the action did not happen. An abort that arrives after a tool call has already mutated a third-party system is closer to a database client disconnecting after COMMIT but before reading the acknowledgement — the write is real, the client just never saw the confirmation.

The agentic loop makes this worse than the database analogy suggests. A single streamed turn can:

  • Call a read-only tool that returns context the model uses to draft the next step.
  • Call a side-effecting tool that mutates an external system the moment the request is dispatched.
  • Stream tokens to the user describing what it did.

If the user aborts between steps two and three, the side effect is real and the user's mental model of what occurred is whatever fraction of the description finished rendering. The two are independent. The token stream does not know what the tool layer did, and the tool layer does not know which tokens the user actually saw. The "result" of the request is now a Cartesian product of two timelines.

This is the part most frameworks paper over. The Vercel AI SDK's consumeStream and onAbort callbacks acknowledge that an aborted stream still needs cleanup, but cleanup is framed as a UI concern — persist partial results, free resources, release the model. None of that reconciles an email that has already been sent. None of it tells the user "we sent an invoice, do you want us to recall it." The framework's mental model ends at the connection boundary; the user's mental model ends at the side effect.

The streaming layer and the tool layer keep separate truths

Walk through what each subsystem actually observes when a user hits stop 200ms into a turn that has already called a tool:

The HTTP layer sees the connection close. It writes client_disconnect to the access log and increments the abort counter.

The streaming layer sees the AbortSignal fire. It calls onAbort, emits whatever telemetry it was configured to emit, and stops forwarding tokens to the client.

The agent runtime sees the AbortSignal forwarded into its execute loop. Depending on the framework, it either (a) raises a cancellation up the call stack the next time the loop yields, (b) waits for the current tool's execute function to return before propagating, or (c) silently swallows the AbortError and emits no event at all, which is the failure mode the Claude Code team has filed bugs against.

The tool layer sees its own HTTP request to the third party complete. It returns 200 OK and a payload describing the side effect. From the tool layer's perspective, nothing went wrong — it ran the function, the function succeeded, and it dutifully returned the result. The cancellation signal arrived after the tool already returned, or arrived while the tool was waiting on a remote system that does not support cancellation.

The third-party system, finally, has no idea any of this happened. It received a request, it executed the side effect, it acknowledged. Its log is the only one that reflects the truth on the ground.

Five subsystems, five different logs, each accurate from its own vantage point and collectively inconsistent. Pick any one as "the truth" and you have either a customer who was billed for an email that was not sent, or a customer who was not billed for an email that was sent, or a customer-facing transcript that claims an action the underlying system never performed.

The framework's optimistic execution is the bug

Most agent frameworks dispatch tool calls eagerly. The model emits a tool-use block, the runtime extracts the arguments, and the tool's execute function runs the moment the block is parsed. The result feeds back into the next model call. This is fast, it is composable, and for read-only tools it is correct.

For side-effecting tools, eager execution is a contract you did not write. It says: the model's decision to call this tool is the commit point. The moment the runtime sees the tool-use block, the action is irrevocable from the user's perspective, because the runtime is going to dispatch it before the user even has a chance to read what the model decided to do.

If the model is fast and the user is reading, the user sees the assistant message describing the tool call after the side effect already landed. The user's only opportunity to abort is after the irrevocable action has occurred. The stop button is, at that point, a placebo.

The fix is structural: side-effecting tools should not commit at dispatch time. They should commit at one of two later points — either when the user has had a chance to confirm, or when the stream completes cleanly. Either way, the runtime has to distinguish between intent and commit, and treat the tool-use block as the former rather than both at once.

What a two-phase tool protocol looks like

Borrow the vocabulary from distributed transactions, because that is what this actually is. A side-effecting tool's lifecycle has three states:

  • Prepared. The arguments have been validated, the request payload has been constructed, the precondition checks have run. No external state has changed. A tool_prepared event is emitted into the stream.
  • Committed. The actual mutation has been applied to the third-party system. A tool_committed event is emitted, with an idempotency key the client can use to look up the canonical record.
  • Compensated. The mutation has been reversed (refund issued, email recalled, row restored). This state exists because the prepare → commit gap is not always observable by the abort handler, and some commits will land before the abort signal reaches the tool layer.

The abort handler's job is to walk a per-stream side-effect ledger and decide what to do for each entry. Prepared but not committed? Cancel the prepared action. Committed before the abort signal arrived? Run the compensating action — or, if no compensation is possible (the email is sent, the SMS has been delivered, the post is published), at minimum record the irreversible side effect against the customer's account so it surfaces in the next session.

The ledger is the part that closes the gap between the streaming layer's cancelled log line and the tool layer's succeeded log line. A single ledger entry per side effect, written before the call, updated as the call progresses, and reconciled by the abort handler. Without it, the two halves of the request keep separate truths forever.

This isn't theoretical. Atomix and similar runtimes treat tool calls as transactional effects, tagging each call with an epoch and tracking per-resource frontiers to determine when to commit, with saga-style compensations on cancel. The pattern works; the part most teams skip is acknowledging they need it before the first incident.

Idempotency is the contract that makes recovery survivable

Every side-effecting tool the agent can call must carry an idempotency key generated by the runtime at prepare time, not at commit time. The key is shaped from the tool name, the resolved arguments, the user identity, and a per-turn nonce — stable enough that a retry within the same logical turn deduplicates, unique enough that a deliberate re-issue starts a new transaction.

The contract has three obligations:

  1. The third-party system, or your wrapper around it, must respect the key — a second call with the same key returns the original result rather than re-executing.
  2. The runtime must persist the key alongside the ledger entry before dispatching the call, so a crash between dispatch and acknowledgement does not lose track of which side effect is in flight.
  3. The compensating action, if one exists, must also be keyed, so an abort handler that double-fires does not produce two refunds for one charge.

Without keys, you cannot safely roll forward (retry the call to confirm it landed) and you cannot safely roll back (issue a compensation without risking a double-compensation). With keys, an abort handler can choose its policy: optimistically retry, optimistically compensate, or defer to a human. Without keys, the only safe policy is "do nothing and hope," and "do nothing and hope" is what every team running on eager dispatch has shipped by default.

The customer-facing log is the place to start, not the runtime

The pragmatic step is not to rewrite the agent runtime. It is to build the per-stream side-effect ledger first, point every existing side-effecting tool at it, and make the customer-facing transcript a view over the ledger rather than over the token stream.

Once the transcript is sourced from the ledger, the asymmetry between "what the user thinks happened" and "what the system did" becomes a visible diff rather than an invisible drift. The user who aborts and sees their transcript end mid-sentence can also see "we sent an invoice to [email protected] — undo." That single affordance, backed by the ledger and an idempotent compensation path, recovers most of the trust the silent commit cost you.

The deeper architectural step is to redesign side-effecting tools as two-phase actors with explicit confirm-or-commit semantics, and to make the abort signal a first-class event that the runtime reconciles rather than a connection-layer detail that the runtime forwards. That work is larger, but the ledger is the wedge that makes it tractable — once you can see the divergence, you can choose where to close it.

The architectural realization

A stream is a transaction the user can unilaterally abort. The user does not need your permission, does not need to coordinate with your backend, and does not need to wait for any state to flush. They close the tab and the transaction is over from their perspective. The question your system has to answer is what happens to every side effect that has already landed at the moment of abort, and the only honest answers are we compensated it, we surfaced it to the user, or we shipped a UX whose visible state and actual state diverge on every cancellation.

Most teams ship the third option, discover it during an incident review when a customer points at an email they did not approve, and then add a feature flag to disable streaming for tools that mutate. That works until the next side-effecting tool ships without the flag set. The structural fix — a ledger, two-phase tools, keyed compensations — is more work and it is also the only fix that survives the next tool the team adds without remembering the lesson from the last one.

Treat the abort as a commit event. Build the ledger. Write the keys. Then turn streaming on for the tools that can survive it, and leave it off for the ones that cannot. The cancellation button on the UI is a contract with the user about reversibility. Honor it deliberately, or accept that you are shipping a placebo.

References:Let's stay in touch and Follow me for more thoughts and updates