Async Agents Need an Inbox, Not a Chat
The chat metaphor has a fuse, and it burns out around thirty seconds. Past that, the spinner stops being a progress indicator and becomes a commitment device — the one making the commitment is your user, and most of them bail. You can watch it in session replays: the typing indicator appears, the user waits, tabs away at about twelve seconds, half never come back. The product team sees a completed agent run with no human on the other end and files it as a success. It is not a success. It is an abandoned artifact that happened to finish.
This is the first contact with a structural problem that most agent products paper over with spinners and streaming text: the chat interface was designed for turn-taking humans and fast models, and it fails silently when either assumption breaks. If your agent takes minutes, you are not shipping a chat feature with a longer wait. You are shipping a different product, and it needs a different UI primitive.
The primitive is the inbox.
The 30-Second Cliff Isn't a Performance Bug
Product teams keep trying to fix long-running agents with engineering tricks — stream tokens faster, show the model's "thinking," preload a skeleton, inject progress messages. All of this helps at the margin when latency is in the single-digit seconds. None of it helps when the wall clock crosses thirty seconds, and beyond about two minutes, the UX arc inverts: the more detail you stream, the more attention you extract, the less productive your user becomes while they sit there watching. You have taken a background task and turned it into cognitive captivity.
The original UX research on latency has been repeated enough to qualify as folk wisdom: one second to feel instantaneous, ten seconds to maintain attention, and beyond that the user mentally leaves. The part that doesn't get quoted as often is what happens on return — refocus cost. When a user comes back to a finished agent response that took three minutes, they don't just read it; they rebuild the context of what they asked, why, and whether the answer still matters. For trivial queries that is fine. For meaningful work, you are asking the user to re-enter a mental state they already paid to build and then chose to leave.
Chat interfaces pretend this problem doesn't exist. They keep the user "in" the conversation by freezing the viewport on a spinner. The sync-UI assumption — that the user is waiting, that the user wants to be waiting, that the user will recognize the answer when it lands — is load-bearing and usually wrong.
What an Inbox Actually Is
"Inbox" is not a skin. It is a commitment to three product guarantees that chat cannot make.
Durability. The run has an ID that outlives the session. A user can close the tab, switch devices, share a link, or get paged and come back an hour later, and the run is addressable. The ID is the artifact. Chat's thread-scoped identity means "close the tab" and "lose the work" are the same action — users learn this and stop closing tabs, which is how you end up with pinned browser tabs running multi-hour agents that nobody dares touch.
Notifiability. When the run ends, the system reaches out to the user. Push, email, Slack, system tray — the specific channel matters less than the fact that completion is an event the product fires, not a state the user is responsible for polling. Chat inverts this: the user polls by staring. An inbox treats the human as a scarce resource to be interrupted on completion, not an always-on attention pump.
Result-over-progress framing. The primary view of an inbox is a list of outcomes — "drafted PR for auth refactor," "analyzed Q1 churn cohort," "answered customer ticket #4812." Progress is available but secondary. Chat is the opposite; the transcript is the product, and the final answer is just the last line of it. This inversion is the one most teams get wrong when they "add an inbox" and it turns out to be a chat history viewer.
You can feel the difference in how people describe the work. Users of inbox-shaped agent products talk about "kicking off a run" and "checking results." Users of chat-shaped agent products talk about "waiting for it to finish." One is delegation; the other is supervision. The product metaphor you picked decides which one your users learn to do.
The Metrics Flip
When teams make this shift, the measurement stack has to move with it. The metrics that looked good in chat become misleading, and the metrics that matter in an inbox are usually not instrumented at the start.
Chat metrics that stop meaning anything:
- Time-to-first-token. In an inbox model, the user is gone by the time the first token arrives. TTFT is still worth tuning for chat-like sub-tasks, but it stops correlating with satisfaction for the long tail.
- Session length. Longer sessions were good in chat because they meant engagement. In an inbox, a user who spends two minutes launching five runs and leaves is getting more value than one who sits through one twenty-minute stream.
- Message-per-session. Chat products optimized for this are measuring captivity, not utility.
- https://www.langchain.com/blog/introducing-ambient-agents
- https://github.com/langchain-ai/agent-inbox
- https://hatchworks.com/blog/ai-agents/agent-ux-patterns/
- https://www.aiuxdesign.guide/patterns/agent-status-monitoring
- https://www.designative.info/2026/03/19/the-conversation-trap-why-defaulting-to-chat-might-be-the-biggest-interaction-design-mistake-of-the-ai-era/
- https://cognition.ai/blog/devin-annual-performance-review-2025
- https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- https://blog.logrocket.com/ux-design/ui-patterns-for-async-workflows-background-jobs-and-data-pipelines/
- https://www.uxtigers.com/post/think-time-ux
