Skip to main content

The Async Tool Call That Resolved After the User Already Closed the Conversation

· 12 min read
Tian Pan
Software Engineer

The clearest sign that an agent's session model is broken is when a tool result has nowhere to go. The agent fired a long-running call — a render, a provisioning job, a multi-step query. The user watched the spinner for a few seconds, decided they didn't need it after all, closed the tab, and moved on. Forty seconds later the tool finishes. Its callback hits your gateway with a conversation_id that no longer points at anything. The gateway has two equally bad options: silently drop the result, or stitch it into whatever session inherits that ID next.

Most teams discover this failure mode the same way: a support ticket where a user sees an answer they did not ask for, attached to a conversation they did not start. Or a downstream system that processed the same charge twice because the gateway helpfully "retried" delivery against the next active session. Or — most commonly — nothing visible at all, just a slow drift in completion metrics that nobody can correlate to anything specific, because the failures don't fire alerts; they fire emptiness.

This is not the same failure mode as fire-and-forget, where the planner treats a job ID as a final answer and moves on without polling. That problem lives inside one agent loop. The problem in this post lives between the agent loop and the rest of your infrastructure: the tool will finish, the result will arrive, and your session boundary has already collapsed underneath it.

The Session You Designed Was Synchronous; The Tools Are Not

Most chat UIs grew up around a synchronous request-reply pattern. The user sends a message, the model answers, the turn closes. Conversation state lives in memory or in a short-lived cache; long-lived state migrates to a database when the conversation ends or after a brief idle window. The whole pipeline assumes that the time from user-input to final-output is bounded and that the user remains attached to the session for the duration of that bound.

Tools broke this assumption without anybody noticing, because the first wave of tools — search, calculators, dictionary lookups, simple API reads — were fast enough to fit inside the implicit "user is still here" budget. Then the second wave landed: rendering, transcription, provisioning, code execution, agent-to-agent dispatch, anything that calls an external system whose tail latency is measured in minutes rather than seconds. The pipeline did not change shape to match. The synchronous session still held the open turn, the planner still expected the result to come back inline, and the only thing keeping the model honest was the user staying attached.

So when the user disconnects — closes the tab, navigates away, kills the app, hits a flaky network, lets the screen lock — the agent loop stays parked waiting for a result it can no longer route. Some clients hold the turn open server-side for a TTL and let it die quietly. Others abort the agent run on disconnect and leave the tool execution orphaned in whatever queue it was dispatched to. Either way: the work continues. The session does not.

This is the gap. A tool whose actual duration exceeds session lifetime will land its result outside the session that requested it. Asking when this happens is the wrong question. Designing for when it happens is the only question.

The Three Things That Happen When the Result Lands Late

When a tool callback arrives carrying a stale conversation_id, your routing layer takes one of three actions, and you should know which one yours takes before something forces you to.

It drops the result. The gateway looks up the conversation, finds it expired, logs a warning, and returns 200 to the tool service so it doesn't retry. The tool ran. The side effects landed wherever they were going to land — the charge cleared, the email sent, the row deleted, the document created. Nothing tells the user. Nothing tells the next session. The work happened in the world and the model has no memory of it. The next time the user starts a conversation and asks "did that thing go through?" the agent has to derive the answer from the world's state, not from its own history. Most agents are not built to do that and will confidently answer either way.

It routes to the next session. The gateway looks up the conversation, finds it expired, and helpfully grafts the result onto whatever conversation the same user opens next. The next session inherits a tool response with no matching tool call in its history. The model, faced with a hanging tool-result message, either ignores it (best case), hallucinates a justifying tool call (medium case), or treats the late result as a fresh user message and acts on it (worst case — the inherited side effect, where the next conversation's agent does additional work in response to leftover output from the previous one).

It routes to a different user entirely. This is the one that wakes the on-call. The conversation_id was reused, or the user identity was tied to the conversation rather than the auth token, or a load-balancer key collision aliased two sessions, or the GC ran and a freshly-minted ID happened to collide with the expired one. The tool result lands in someone else's chat. Once is a near-miss; twice is a postmortem.

The first failure mode is invisible until you correlate completion rates with disconnect rates. The second is invisible until a user notices that an answer doesn't match their question. The third writes itself into the incident channel.

The Idempotency Story Was Already Hard; Now It Is Two Idempotency Stories

Production retry stories assume that the system retries, not that the session retries. A durable execution engine like Temporal, Restate, or LangGraph's checkpointing layer guards against worker crash and tool flakiness by journaling each step and replaying with idempotency keys so completed work is not duplicated. This works because the workflow run is the unit of identity, and the idempotency key derives from the workflow run ID combined with the step.

The async-tool-callback problem is the orthogonal one. It is not the system retrying the same workflow; it is the user abandoning the workflow and starting a new one before the old one finishes. The idempotency key has nothing to deduplicate against, because the new workflow run has a different ID, and the tool service has no way to know that the new run "is" the same user wanting the same outcome.

Two failure surfaces, both wearing the word "retry":

  • Engine-level retry: the worker crashed, the workflow resumes, the same step needs to either be replayed-from-journal or re-executed-with-idempotency. Solved by durable execution. Well understood.
  • User-level retry: the conversation expired, the user starts over, the tool result from the prior run is now an artifact looking for an owner. Not solved by durable execution. Often not solved by anything.

If your tool service is well-built, it has an idempotency key derived from the run ID and step. That key protects you from duplicate execution if the engine retries. It does not protect you from the user starting a new conversation that re-issues the same logical request — to the tool service, those are two different keys and two different calls, and both will execute. The second call might succeed where the first one was about to. The first one might still complete after the second has already settled the outcome. The user sees one answer; the world sees two side effects.

The cleanest fix is to derive the idempotency key from something stable to the intent — the user, the tool, and the input — rather than the conversation run. That requires the tool layer to know which user is calling, which most tool layers do, and to accept that the same user calling the same tool with the same arguments within some window is the same logical request. Picking that window is the design choice. Pick it too narrow and you allow double-execution; pick it too wide and you block a user from legitimately re-doing the same work.

Tools Need a Reversibility Tier, Not Just an Idempotency Key

Idempotency tells you whether it is safe to execute the call twice. Reversibility tells you whether it is safe to execute the call at all once the session has detached.

Reads are trivially safe. A read whose result has nowhere to go can be dropped — the side effect is zero. Writes split into two tiers: revertible writes, where the side effect can be undone if the result has no consumer (a draft saved, an idempotent provisioning step that can be torn down), and one-way writes, where the side effect persists regardless of who is listening (a sent email, a posted message, a charged card, a deleted row).

The agent's planner has no native concept of this distinction; the function-calling schema does not name it. The runtime layer has to. Before dispatching a tool, the gateway needs to know what happens if the session is gone by the time the tool finishes:

  • Cancellable / revertible: emit a cancel-on-disconnect signal to the tool service when the session expires; ignore the late result.
  • Idempotent and durable: persist the result against a stable user-and-intent key; deliver it to the next session that matches; show the user the carry-over result as the first turn of their next conversation.
  • One-way and irreversible: do not dispatch on a session boundary that might collapse before the tool finishes; require a separate confirmation surface (notification, email, dedicated task list) so the result has a home that does not depend on the session staying open.

The third tier is the one most teams skip, because the synchronous chat UI does not have a place for it. The chat is the only surface. Adding a task list, a notification channel, or a "your render finished" surface means treating the long-running tool as the unit of state rather than the conversation as the unit of state. That is the architectural shift the durable-execution community has been pushing for two years, and most agent frontends still have not made it.

Session Lifetime Should Be a Function of the Slowest Tool the Agent Can Dispatch

The default in most stacks is that conversation TTL is set by product instinct — minutes for chat, hours for assistants, days for project-style work — and tool timeouts are set by ops instinct — whatever made the slow tool stop failing under load. These two numbers were almost never picked together. The interesting failure cases all live in the gap between them.

A useful invariant: session-state lifetime must exceed the worst-case completion time of any tool the agent can dispatch from that session. Otherwise you have a guaranteed orphan rate equal to the fraction of tool runs that exceed session TTL, and that orphan rate is invisible to your existing dashboards unless you specifically count "tool results delivered to expired conversations" — which most teams do not.

This invariant is easier to state than to enforce. A 24-hour session TTL is cheap in terms of database rows and expensive in terms of context that grows stale. Letting the agent dispatch tools that take a day to complete forces the session-state layer to outlive the user's attention by a wide margin. The honest move is to split the state model: short-lived in-memory conversation state for the chat experience, long-lived durable run state for the agent loop and its outstanding tool calls, and a delivery layer that reconciles the two when results land.

Once the two state layers are separate, the question of "what happens when the user closes the conversation" becomes a straightforward routing decision rather than a data-loss event. The agent run keeps going against durable state. The tool result lands against the run's stable ID. When the user comes back — same session or a new one — the runtime asks the agent loop whether there are outstanding completions to surface, and the answer is either "yes, here is the render you started yesterday" or "no, everything you cared about resolved while you were gone."

The Failure Mode You Will Notice Last

The post-mortem version of this problem is almost always written about the cross-session-leak case, because that is the one that gets reported. The version that costs more money over time is the silent-drop case — the tool runs to completion, the side effect lands in the world, and the user is never told. You pay for the tool, you pay for the side effect, and you get zero conversion credit because the user never saw the answer.

The instrumentation to catch this is unglamorous: for every tool call dispatched, log when it completes and whether a session was attached at completion time. Compute the ratio. If detached-at-completion is more than a few percent of total long-running calls, you have an architecture problem, not an alerting problem. The fix is not a louder alarm; the fix is the split state model and the delivery layer described above.

The async tool call you fired is going to finish. The interesting question is whether your system has somewhere to put the result by the time it does.

References:Let's stay in touch and Follow me for more thoughts and updates