Skip to main content

The Interrupt UI That Taught Your Users to Never Interrupt the Agent

· 10 min read
Tian Pan
Software Engineer

The interrupt button on your streaming agent has a 0.4% click rate. The product team reads that number and concludes the feature is working as intended — most generations don't need to be interrupted, the implementation is fine, ship it, move on. The actual reading is that the interrupt button taught your users not to press it. Within a week of using the product, they figured out that pressing stop discards the partial response, clears the context, and dumps them back at an empty input box. The lesson they learned is to wait through a bad answer rather than risk losing the thread.

That 0.4% is not a usage signal. It is an aversion signal. Your users are not happy with the answers — they are afraid of the cost of trying to redirect them, and their adaptation is to sit quietly while the agent finishes saying something they already know is wrong. The engineering team treated "stop generation" as a model-call cancellation. The user treated it as "redirect, don't restart." The two definitions never met, and the product shipped a feature that quietly drained user agency from every long-running conversation.

Stop is not a circuit breaker, it is a conversation turn

When you build streaming into an agent, the natural mental model on the engineering side is the request lifecycle. A request goes out, tokens stream in, the user can cancel the request mid-flight, and the SDK closes the connection. Stop, in that frame, is a clean technical primitive: abort the fetch, free the inference budget, return the UI to idle. The implementation that follows is honest to the primitive — discard the partial buffer, reset the input, you're done.

The user is not thinking in request lifecycles. They are thinking in conversation. The model started answering a question. Three sentences in, the user can already see the answer is going in the wrong direction — wrong assumption about their stack, wrong scope, wrong tone, wrong language. Their intent when they hit stop is not "cancel the API call." It is "redirect this — I have new information that should reshape what you say next." Those two intents have the same button label and completely different correct implementations.

The product that confuses them ships a feature whose semantics are honest to one side and brutal to the other. The user pressed stop expecting a turn in the dialogue, the way they would say "actually, wait, let me restate that" to a human. They got an empty box. The next time they see a generation heading the wrong way, they will not press stop. They will sit and watch.

How users learn the cost of your primitives

User habituation is fast and unforgiving. There is a body of work on banner blindness that shows users learn to filter UI elements within a handful of exposures, especially when the element either consistently delivers no value or consistently delivers harm. Interrupt UIs that wipe context fall into the harm bucket, and they get filtered out the same way habituated ad slots do — only the cost here is not a missed ad impression. It is that your safety primitive has been disarmed by the user's own coping behavior.

You can read this in the data if you know what to look for. The interrupt click rate flat over time, regardless of generation length or quality complaints, is the signature. So is the curious negative correlation between conversation length and interrupt usage — long conversations have more bad-answer moments per session and should mechanically produce more interrupts, but they often produce fewer, because users in long sessions have the most to lose by resetting context and have learned the hardest. A small minority of new users will press the button once or twice, get burned, and join the rest.

The metric that catches this is not click rate. It is a retention-after-interrupt metric: of users who press stop, what fraction make their next turn within sixty seconds, and what fraction abandon the session within that same window? The 0.4% number aggregates two completely different populations — the rare power user who knows the cost and is willing to pay it, and the new user who pressed once and never came back. Reporting them as one number hides the harm.

The redirect lane: what interrupt should actually open

The fix is not subtle. It is a redesign of what the button means, expressed at every layer from UI to state to model context.

When the user presses stop, the streamed response so far should not be discarded. It is conversation state, not transient buffer. Capture it. The UI should not return the user to an empty input. It should open a lane labeled something like "what would you like instead?" with the partial response visible above it as the thing being redirected. The user types their correction. The next turn to the model includes the partial response, the user's redirection, and an instruction that tells the model to treat the partial as a starting attempt that needs to change in the specified direction — not as a finished answer to react to.

This is closer to how a human conversation actually handles an interrupted thought. You do not delete the half-sentence you said when someone cuts you off. You take their correction and continue from where the redirection happened. The agent's model handles this fine when you give it the structured context; the failure was never the model's, it was the UX layer throwing away the only information the next turn needed.

A few patterns make this redirect lane work in practice:

  • The partial response is rendered above the redirect input in a slightly muted state, with a label like "stopped here." The user can read it, point at it, reference parts of it in their correction.
  • The redirect input has a different placeholder than the regular input — "tell me what to change" rather than "ask a question" — because the cognitive task is different.
  • The next assistant turn is rendered as a continuation of the same thread, not as a fresh response to a fresh prompt. The conversation tree shows the partial, the redirect, and the new response as a connected sequence. Users can see they did not lose anything.
  • Pressing stop a second time during the redirected generation does the same thing again. The cost of trying to steer never resets to "lose context."

When this is the behavior, the interrupt click rate goes up — sometimes by an order of magnitude — and the corresponding user satisfaction on long conversations goes up with it. The increase is not a regression. It is users finally trusting that the primitive does what they want.

The architectural commitment behind the redesign

This is not a UI tweak. It requires the conversation state to be the source of truth, and the request lifecycle to be a derived view of it.

Most streaming agent implementations have it backwards. The fetch request and the assistant message are coupled — the message exists only as the receiving end of a successful stream. When the stream is canceled, the message is incomplete and gets dropped because there is no place in the data model for "this assistant turn exists but is partial and the user said something next." Fixing the UX requires the data model to admit that state.

The persistence layer needs to support an assistant message in a "partial — superseded" state, with the partial content preserved, an interrupt timestamp, and a link to the user's redirection turn that follows. The streaming layer needs to flush the partial buffer to that persistent message before closing the connection, not after. The continuation prompt needs to include the partial response as structured context, distinguished from prior completed assistant turns so the model knows it should revise rather than react. The trace pipeline needs to record the partial as a first-class span linked to the redirected continuation, so debugging shows the actual conversation shape and not a hole where the canceled call used to be.

Each of these is small in isolation. They are usually missing in concert because the team treated the interrupt as a transport-layer concern instead of a conversation-layer concern, and the data model inherits whichever layer the implementation started in. Teams that build the conversation model first and let the transport be a detail of it get the redirect lane almost for free. Teams that build the transport first and try to retrofit the conversation model later discover that every layer of their stack assumes "canceled means discarded," and the retrofit becomes a multi-week project nobody had budgeted.

A metric that detects the silent training

If you take one thing from this, take the metric. Click rate on the interrupt button is a vanity number that hides the harm.

Replace it with something like this: of users who press stop in a given session, what percentage send another message within sixty seconds? Of those who do, what percentage of their next message resembles a redirection (references the prior response, includes a corrective signal) rather than an unrelated new question? Of the users who press stop and do not return, what is their seven-day retention compared to users who never pressed stop and to users who pressed and continued?

The first ratio tells you whether stop is functioning as a redirect or as an abandonment trigger. The second tells you whether your continuation prompt is actually treating the partial as context or whether users have to repeat themselves anyway. The third tells you whether the bad experience of pressing stop is bleeding into overall product churn — which is the financial number that finally gets attention when click rate cannot.

Teams that instrument this consistently see the same shape: a healthy interrupt experience produces a high redirect-within-sixty-seconds rate and roughly neutral seven-day retention. A broken one produces an abandonment-heavy distribution and worse seven-day retention than users who never tried to interrupt. The broken state is more common than people expect, even at well-funded teams, because nobody set the metric up to catch it.

The deeper lesson

The interrupt button is a small surface that exposes a large commitment. If your data model treats the conversation as the durable thing and the transport as a way of populating it, the interrupt lane is a natural addition to a system that already preserves partial context. If your data model treats the request as the durable thing and the conversation as a byproduct of completed requests, the interrupt lane requires you to reverse the polarity of your persistence layer to add it.

The choice between those two architectures is usually made implicitly, in the first week of the project, by an engineer who is not thinking about whether the user will be able to redirect a streaming response three months from now. The implicit choice becomes a contract with the user about what kinds of agency the product supports, and the contract is enforced by what users learn to expect from the primitives the system exposes.

A streaming agent without a redirect lane is teaching every user that interruption is dangerous. The 0.4% click rate is the lesson sinking in. The fix is not to make the button bigger, more prominent, or differently colored. The fix is to make pressing it preserve the thing the user came to the product for in the first place: the conversation.

References:Let's stay in touch and Follow me for more thoughts and updates