Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run
Watch a developer use an agentic IDE for twenty minutes and you will see the same micro-drama play out three times. The agent starts a long task. Two tool calls in, the user realizes they want a functional component instead of a class, or a v2 endpoint instead of v1, or tests written in Vitest instead of Jest. They have exactly one lever: the red stop button. They press it. The agent dies mid-edit. They copy-paste the last prompt, append the correction, and pay for the first eight minutes of work twice.
The abort button is the wrong affordance. It treats "I want to adjust the plan" and "I want to throw away the run" as the same gesture. In practice they are as different as a steering wheel and an ejector seat, and conflating them is why so many agent products feel brittle the moment a task takes longer than a single screen of output.
The fix is not a better progress bar or a more confident model. It is a set of architectural seams that let the user introduce new information without discarding correct work — and a UX vocabulary that distinguishes course correction from override from cancellation so the user can pick the right one without thinking about it.
The Abort-Only UI Is a Distributed Systems Bug in Disguise
Most agent UIs inherit their shape from request-response web apps. You submit a prompt, a stream comes back, you watch it land. The only control signal flowing the other way is the HTTP connection itself — kill the socket and you kill the run. That model is fine when the unit of work fits in a single response. It falls apart when the unit of work is a fifteen-minute plan with a dozen tool calls and the user has new information at minute seven.
The root cause is a missing channel. Streaming tokens and events go down to the client; almost nothing comes back up until the run is done. When users eventually press Escape, they hit a stateless HTTP boundary that has to tear down the stream and the inference pipeline together. There is no primitive for "keep going but listen to me." So the product ships with Stop as the only option, and every correction becomes a restart.
Teams shipping real mid-flight steering in 2026 — from GitHub Copilot's agent mode work to the Codex-style editors — have all converged on the same diagnosis: you need a persistent bidirectional channel, an explicit in-flight run identity, and a place on the server where steering inputs land so the agent can pick them up between actions. Without those three things, every interrupt is a race with cancellation semantics, and the user's correction usually arrives one tool call too late.
Three Architectural Seams Worth Building
Once you accept that steering is a transport-and-state problem rather than a UX skin, three seams do most of the work. They are not alternatives; they compose.
Checkpoint-and-inject. The agent's state — plan, scratchpad, tool-call history, retrieval results — lives behind a checkpointer that writes after every step. A steering input becomes a structured update merged into the checkpoint before the next step reads it. LangGraph's interrupt() primitive is the reference implementation: the graph persists, pauses, and resumes from the same thread ID with the user's payload merged in. The value of this seam is that correct work survives. You do not unwind seven tool calls because the eighth one was headed somewhere wrong.
Plan revision hooks. Somewhere in the loop, the agent commits to a plan — an explicit list of steps, a tree of subgoals, a TODO. Exposing that plan as a first-class editable object (not a log line, not a message) gives the user a natural place to intervene. Edit the step, reorder it, delete it, add a constraint to it, and let the agent re-read the plan on the next iteration. This is the difference between shouting "stop, stop, do it differently" into a chat box and pointing at step 4 and typing "use the v2 API here." The second is actionable because it lands on a structured surface the agent already consults.
Soft-interrupt tokens. Between tool calls — or at any natural checkpoint inside a long-running step — the agent reads from a steering inbox. If a message is waiting, it is spliced into context as a system note and the agent decides whether to replan, acknowledge, or continue. The inbox is non-blocking: the user can drop notes into it at any time, and the agent picks them up at the next safe point. Claude Code's hook events (PreToolUse, PostToolUse, UserPromptSubmit during an active run) are one concrete instantiation; asynchronous steering notes proposed for Copilot's agent mode are another. The shared insight is that the agent should be polling for user input even when it is not pausing.
The UX Taxonomy: Correction, Override, Cancellation
Having the seams is half the job. The other half is giving the user three different verbs instead of one.
Course correction is additive. The agent is doing roughly the right thing but the user wants to adjust a detail: "prefer Tailwind over inline styles," "skip the auth module, it's already done," "write tests before the refactor, not after." Course corrections land in the steering inbox and get folded into context. They should not pause execution; the agent reads them at the next natural boundary. UX-wise they want an always-available text input below the stream, ideally with a visible "delivered at step N" receipt so the user knows when it took effect.
Override is directive. The agent is going somewhere the user does not want it to go and the user needs to stop the current trajectory but preserve the state: "abandon this approach, revert to before the database migration, try a different angle." Overrides map to checkpoint operations — roll the state back to a named point, optionally rewrite the plan, then let the agent continue. They should pause execution briefly (you do not want new tool calls against a state you are about to rewind) and require an explicit confirmation if they destroy completed work.
Cancellation is terminal. The user wants the run dead, resources released, nothing preserved except the transcript. This is the original Stop button, and once the taxonomy is in place it can finally behave like a kill switch instead of a catch-all. Cancellation should be one click and irreversible; if the user might want anything saved, they should have used override.
The reason this taxonomy matters is that most "too aggressive" and "too passive" complaints about agent UIs are actually mode-selection failures. Users who want to correct are forced to override, so they learn not to interrupt at all and let the agent finish wrong. Users who want to cancel are afraid of losing work, so they wait out bad runs. Three buttons, clearly labeled, with the right semantics behind each one, eliminate both failure modes.
The Race That Arrives One Tool Call Too Late
The hardest failure mode is not architectural — it is temporal. The user sees the agent about to do something wrong, types the correction, hits enter, and watches the wrong tool call fire anyway because the message arrived between the model's decision and the tool's dispatch.
This race is not hypothetical. Issues filed against multiple agent harnesses describe the exact pattern: the user presses Escape during a tool call, the abort handler looks up activeChatRunId, finds it not yet set because it is assigned after the request returns, and the abort silently no-ops. The agent happily finishes the mutation.
Three mitigations keep this from being a bug hunt forever. First, set the run identity before the first tool call fires, not after — the window for racy aborts shrinks to microseconds. Second, make the tool-dispatch path check the steering inbox as part of its preamble, so a message that landed one millisecond ago still gets noticed before the side effect. Third, design destructive tools (file writes, HTTP mutations, database changes) to be cancellable or compensatable at the call site, so an interrupt that arrives after the call started can still prevent the commit. This last point is the one teams tend to skip and regret: the whole steering story is undermined if the user's override cannot unwind a mutation that fired three seconds earlier.
For non-destructive tool calls — reads, retrievals, web fetches — the race matters less because the worst case is wasted work. For destructive ones it is the whole game. The simplest discipline is to tag tools in your manifest with a reversibility flag and require a confirmation step (or a longer steering window) for irreversible ones. It costs a small amount of latency on a few calls and buys you a steering story that does not leak at exactly the moment the user needs it most.
When Not to Build This
Steering is expensive. The checkpointer costs storage and write latency on every step. The bidirectional channel complicates your deployment topology. The plan-as-object discipline constrains how you write your agent loop. If your product runs agents for thirty seconds at a stretch and users can just wait them out, you do not need any of this — the abort-and-retry flow is fine and the engineering budget is better spent elsewhere.
Steering becomes load-bearing when runs get long enough that restarts hurt, when tool calls do real work (cost money, mutate state, contact users), or when users are experts who want to collaborate with the agent rather than supervise it from above. Coding agents, research agents, and data pipeline agents all cross this line quickly. Customer-facing single-turn assistants mostly do not.
The diagnostic question is simple: watch five users on your product and count how many times they press Stop and immediately re-prompt with a small modification. If the answer is more than zero, you have a steering problem dressed up as a restart habit, and every one of those restarts is throwing away work that a checkpoint-and-inject loop could have kept.
The Shape of the Fix
The move from abort-only to steerable agents is a specific piece of plumbing: a checkpointed state store, a persistent run channel, a plan surface users can edit, and a three-verb control vocabulary at the UI. None of it is research-frontier — the primitives exist in LangGraph, the Claude Agent SDK, and every agent harness that takes long-running work seriously. The work is deciding it matters enough to invest before your users learn not to interrupt.
The agents that feel collaborative in 2026 are not the ones that got smarter about inferring intent. They are the ones that made themselves interruptible without being killable. That is not a model capability. It is an architecture choice, and it is waiting on the other side of the abort button.
- https://ably.com/blog/ai-transport-redirect-steering
- https://github.com/microsoft/vscode/issues/288920
- https://docs.langchain.com/oss/python/langgraph/interrupts
- https://www.langchain.com/blog/making-it-easier-to-build-human-in-the-loop-agents-with-interrupt
- https://platform.claude.com/docs/en/agent-sdk/hooks
- https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/steering/
- https://newsletter.victordibia.com/p/4-ux-design-principles-for-multi
