Skip to main content

The Downstream API That Kept Writing After the User Cancelled the Conversation

· 10 min read
Tian Pan
Software Engineer

The user hits stop. The browser closes the SSE connection. Your AI SDK fires onAbort. The agent runtime sees the signal, stops requesting more tokens from the model, and tears down its loop. From inside your codebase, the cancellation looks crisp. Every subsystem you can see is doing the right thing.

Meanwhile, two seconds earlier, the model emitted a tool call. The runtime dispatched it. The tool's execute function opened a TCP connection to a third-party API and posted a payload. That HTTP request is still in flight, the third party's server is still processing it, and the third party has no way of knowing that the conversation it is serving no longer exists. The write commits. The user's mental model says they escaped the action by hitting stop. The downstream system's database says otherwise.

This is the failure mode that lives in the gap between in-process cancellation and remote cancellation. Most engineers reason about AbortController as if it propagates the way Go's context.Context does — a single token that fans out through every goroutine in the call graph and trips the cancellation channel at every blocking operation simultaneously. That reasoning is correct inside your process. It is mostly wrong across the network. A cancellation that traverses two HTTPS hops, an L7 load balancer, and a vendor's queue is no longer a cancellation. It is a hope.

AbortSignal stops your code, not your dependencies' code

AbortController is a Web Platform primitive designed to interrupt blocking work that runs inside your runtime. When you wire it into fetch, you are asking your runtime to close the TCP socket and reject the promise. That is what it does. It is genuinely useful: the GPU at the provider notices the socket close within a few hundred milliseconds and stops generating tokens, which is why streaming LLM cancellation works as well as it does for the inference itself.

But the moment your tool's execute function dispatches an HTTP request to a third party that is not the LLM provider — Stripe, Mailgun, Salesforce, your own internal service, anything that performs a side effect — the cancellation contract changes. Closing the connection to a write endpoint does one of three things, and which one depends on the server's implementation, not yours:

  • The server detects the closed socket before the handler reaches its commit point and aborts. This is the case you are subconsciously assuming.
  • The server detects the closed socket after commit, attempts to flush the response, fails to flush, and writes a log entry saying "client gone." The write already landed.
  • The server does not detect the closed socket at all because the request was enqueued for asynchronous processing. A worker downstream picks the message off a queue ten seconds later and executes it against a conversation that closed a long time ago.

Three different outcomes, only one of which matches the user's expectation. The runtime cannot distinguish between them because the signal stops at the socket. There is no equivalent of ctx.Done() that the third party can listen to.

Cancellation tokens do not cross process boundaries unless you design them in

Go practitioners learn this lesson the hard way the first time they wire context.Context across a service boundary. Inside a process, cancelling the parent context immediately closes the Done() channel on every derived context, and every goroutine that selects on it returns within microseconds. Across a service boundary, the context value vanishes — there is no field in HTTP or gRPC's standard envelope that carries "this request has been cancelled by an upstream client."

You can approximate it. You can propagate a deadline header that downstream services check on each operation. You can issue an out-of-band DELETE /jobs/{id} after the original POST. You can include a cancellation token in the original request that the server polls before each commit point. All of these are explicit protocols you have to design, document, and enforce on both sides of the wire.

LLM tool-use frameworks ship with none of these protocols. The fetch inside the tool is identical to a fire-and-forget HTTP call. The AI SDK's abortSignal lives entirely on the client side of that fetch. When the SDK passes the signal into the tool's execute function, the runtime knows about the cancellation, but the destination has no awareness that the work it is doing has been abandoned. Worse, the runtime's abort might fire after the request body has already been transmitted, leaving the server in a state where it is processing a request whose initiator has hung up.

The async work that outlives the conversation

The harshest variant of this failure shows up when the tool's downstream API is asynchronous. The tool's execute function does not actually perform the side effect — it enqueues it. It calls something like POST /workflows/run and gets back 202 Accepted with a run ID. From the runtime's perspective, the tool returned successfully. From the third party's perspective, a workflow is now scheduled to execute, possibly minutes from now, possibly on a different machine.

If the user aborts at the moment the tool returns, the runtime cancels cleanly. The conversation closes. The user's session ends. The third party's worker queue does not know any of this. It picks up the job on its own schedule and runs it against state the user thinks they have escaped. The side effect commits minutes after the user closed the tab.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates