Async Agents Need an Inbox, Not a Chat
The chat metaphor has a fuse, and it burns out around thirty seconds. Past that, the spinner stops being a progress indicator and becomes a commitment device — the one making the commitment is your user, and most of them bail. You can watch it in session replays: the typing indicator appears, the user waits, tabs away at about twelve seconds, half never come back. The product team sees a completed agent run with no human on the other end and files it as a success. It is not a success. It is an abandoned artifact that happened to finish.
This is the first contact with a structural problem that most agent products paper over with spinners and streaming text: the chat interface was designed for turn-taking humans and fast models, and it fails silently when either assumption breaks. If your agent takes minutes, you are not shipping a chat feature with a longer wait. You are shipping a different product, and it needs a different UI primitive.
The primitive is the inbox.
The 30-Second Cliff Isn't a Performance Bug
Product teams keep trying to fix long-running agents with engineering tricks — stream tokens faster, show the model's "thinking," preload a skeleton, inject progress messages. All of this helps at the margin when latency is in the single-digit seconds. None of it helps when the wall clock crosses thirty seconds, and beyond about two minutes, the UX arc inverts: the more detail you stream, the more attention you extract, the less productive your user becomes while they sit there watching. You have taken a background task and turned it into cognitive captivity.
The original UX research on latency has been repeated enough to qualify as folk wisdom: one second to feel instantaneous, ten seconds to maintain attention, and beyond that the user mentally leaves. The part that doesn't get quoted as often is what happens on return — refocus cost. When a user comes back to a finished agent response that took three minutes, they don't just read it; they rebuild the context of what they asked, why, and whether the answer still matters. For trivial queries that is fine. For meaningful work, you are asking the user to re-enter a mental state they already paid to build and then chose to leave.
Chat interfaces pretend this problem doesn't exist. They keep the user "in" the conversation by freezing the viewport on a spinner. The sync-UI assumption — that the user is waiting, that the user wants to be waiting, that the user will recognize the answer when it lands — is load-bearing and usually wrong.
What an Inbox Actually Is
"Inbox" is not a skin. It is a commitment to three product guarantees that chat cannot make.
Durability. The run has an ID that outlives the session. A user can close the tab, switch devices, share a link, or get paged and come back an hour later, and the run is addressable. The ID is the artifact. Chat's thread-scoped identity means "close the tab" and "lose the work" are the same action — users learn this and stop closing tabs, which is how you end up with pinned browser tabs running multi-hour agents that nobody dares touch.
Notifiability. When the run ends, the system reaches out to the user. Push, email, Slack, system tray — the specific channel matters less than the fact that completion is an event the product fires, not a state the user is responsible for polling. Chat inverts this: the user polls by staring. An inbox treats the human as a scarce resource to be interrupted on completion, not an always-on attention pump.
Result-over-progress framing. The primary view of an inbox is a list of outcomes — "drafted PR for auth refactor," "analyzed Q1 churn cohort," "answered customer ticket #4812." Progress is available but secondary. Chat is the opposite; the transcript is the product, and the final answer is just the last line of it. This inversion is the one most teams get wrong when they "add an inbox" and it turns out to be a chat history viewer.
You can feel the difference in how people describe the work. Users of inbox-shaped agent products talk about "kicking off a run" and "checking results." Users of chat-shaped agent products talk about "waiting for it to finish." One is delegation; the other is supervision. The product metaphor you picked decides which one your users learn to do.
The Metrics Flip
When teams make this shift, the measurement stack has to move with it. The metrics that looked good in chat become misleading, and the metrics that matter in an inbox are usually not instrumented at the start.
Chat metrics that stop meaning anything:
- Time-to-first-token. In an inbox model, the user is gone by the time the first token arrives. TTFT is still worth tuning for chat-like sub-tasks, but it stops correlating with satisfaction for the long tail.
- Session length. Longer sessions were good in chat because they meant engagement. In an inbox, a user who spends two minutes launching five runs and leaves is getting more value than one who sits through one twenty-minute stream.
- Message-per-session. Chat products optimized for this are measuring captivity, not utility.
Metrics that start to matter:
- Resumption rate. Of runs that completed while the user was away, what percentage did the user actually return to review? This is the real measure of whether your notifications work and whether your result framing is legible. A resumption rate under 60 percent means your inbox is a graveyard.
- Share rate. How often does a completed run get shared — link copied, sent to a teammate, pasted into a ticket? This is the cleanest signal that your run-as-artifact model is delivering value. Shared runs also bootstrap multiplayer use without you building multiplayer features.
- Repeat-task launch rate. Does the same user come back and launch a similar task in the next week? In chat, this was just "returning user." In inbox, it means the delegation contract held — the user trusted the output enough to send another one.
- Abandonment differential. Instead of a single abandonment number, track abandonment-during-run versus abandonment-on-result. The first should be near 100 percent by design — you want users to leave. The second is the one to drive down.
The shift is uncomfortable because it looks, in dashboards, like your product got less sticky. Engagement drops. Dwell time drops. That is not a regression. That is the user getting their time back. If your org reads engagement as revenue, you need to renegotiate the metric before you ship this.
The Seam You Can't Pretend Doesn't Exist
Agent products tend to have a bimodal latency distribution. A chunk of tasks finish in seconds — a quick question, a single tool call, a summarization. Another chunk take minutes to hours — a multi-step investigation, a code migration, a research run with dozens of page fetches. There is almost nothing in the middle, because the minute you add a tool-use loop with retries, you either stay under a few seconds or you blow past thirty.
The seam between these two regimes is where UX design is hardest and where most teams punt. The common failure mode is to pick one UI for both. Chat-only products torture users on the long tail. Inbox-only products feel sluggish on the short tail, where a simple question goes into a "run" that pops out a notification the user didn't want. Neither is right because neither acknowledges the seam.
The pattern that works is to make the seam explicit in the product. A task starts in a chat-like surface with streaming output. If the model signals — or the harness detects, through elapsed time, pending tool calls, or a planned step count — that this will cross into long-running territory, the run promotes itself. The chat thread collapses to a card: "this is now running in your inbox, we'll ping you." The user gets their cursor back. The run gets a stable ID. The result lands where results land.
This is not a fancy feature. It is the recognition that interactive tools and background workers are different products sharing a codebase, and that asking the user to guess which mode they're in was never going to work. Some products expose this as an explicit "run async" button. That's fine for power users; it's a terrible default because most users don't know, at kickoff, how long the model will take. The model often doesn't know either. Automatic promotion based on elapsed time or planned step count moves the decision into the harness, where it belongs.
Inbox Without a Backing Store Is Theater
A working inbox requires plumbing most teams underestimate. The list of guarantees sounds mild until you actually ship it. Each run needs persistent state that survives the user's session, the model's retries, and your next deploy. Each run needs a stable URL that renders the same content a week later regardless of which shard ran the job. Each run needs a notification channel that the user can configure, silence, and audit. Each run needs a reviewable, diffable representation of what the agent did — not a chain-of-thought dump, not a raw tool trace, but a summary with enough citation to let the user confirm the answer without re-doing the task.
This is the transactional inbox pattern that distributed systems people have been shipping for twenty years, dressed for a new use case. Runs are persisted with their inputs, intermediate state, and results. Completion fires a durable event. The event drives notifications and the UI refresh. If any piece of this is ephemeral, the inbox collapses back into chat the first time your pod restarts mid-run. Users have very good memory for the one time your product ate their work.
Three implementation gotchas worth flagging for teams moving from chat to inbox:
- Retries must not create new runs. If your harness retries on transient failures, the user should see one run with multiple attempts, not three runs with ambiguous status. This is idempotency in a new costume.
- The result schema must be versioned. You will change how agents summarize their work, and you do not want to invalidate last month's shared links. Store the raw trace and render the summary on read.
- Notifications have a cost. Users who get one "your run is done" ping for every kickoff learn to ignore them within a week. Batch, digest, or condition on importance — a run that finishes in twenty seconds probably doesn't need a push notification; a run that fails after an hour definitely does.
Where Chat Still Wins
The inbox is not a replacement for chat. It is the right primitive for delegation, not for conversation. Brainstorming, explaining, iterating on a piece of writing in real time — these stay chat-shaped because the value is in the back-and-forth, not the terminal state. The mistake is assuming every agent interaction is conversation. Most productive agent use is a delegated task whose conversational shell was an artifact of the LLM API looking like a chat API.
As agents get longer-horizoned — cloud-hosted development agents, research agents that run for hours, ambient agents reacting to event streams — the center of gravity moves toward the inbox. The teams who figure this out first will ship products where users delegate work and trust it to land. The teams still optimizing their spinners will ship products where users learn, one aborted run at a time, that the AI is not quite worth the wait. Both products use the same models. Only one of them is a product.
If you're building an agent today and your roadmap includes "make it faster" as the answer to long-latency complaints, stop. The complaints are not about speed. They're about the metaphor. Give users a way to send work and walk away, a way to come back and find results, and a way to share a run as an artifact. Then make it faster. In that order.
- https://www.langchain.com/blog/introducing-ambient-agents
- https://github.com/langchain-ai/agent-inbox
- https://hatchworks.com/blog/ai-agents/agent-ux-patterns/
- https://www.aiuxdesign.guide/patterns/agent-status-monitoring
- https://www.designative.info/2026/03/19/the-conversation-trap-why-defaulting-to-chat-might-be-the-biggest-interaction-design-mistake-of-the-ai-era/
- https://cognition.ai/blog/devin-annual-performance-review-2025
- https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- https://blog.logrocket.com/ux-design/ui-patterns-for-async-workflows-background-jobs-and-data-pipelines/
- https://www.uxtigers.com/post/think-time-ux
