The Abandon Primitive: Why Your Agent Loop Needs a First-Class Way to Quit a Plan
Look at the loop primitives most agent frameworks ship: continue, return, retry, and a step budget that hard-stops the run. Notice what is missing. There is a path that says "the work succeeded," a path that says "the model wants to keep going," and a path that says "we ran out of money or patience and shot the loop in the head." There is no first-class path that says "the plan I am executing is wrong, and I want to throw it away and start a different one." The abandon primitive — an explicit, structured way for the planner to declare its current trajectory hopeless — is the missing verb in the agent loop's grammar, and its absence is responsible for a category of failures that are usually misdiagnosed as "the model needs more reasoning."
A planner three steps into a doomed branch keeps refining the same wrong plan because the loop's only exits are succeed, retry the last step, or hit the budget. None of those are "give up on the strategy and try a different one." So the agent does what the loop allows: it edits its plan in place, calls one more tool, asks for one more clarification, and burns through its step budget converging on a non-solution. When the wall finally hits, the user sees a polite failure message that is not an answer to their question. The cost of those wasted steps is real — production data suggests 5–10% of token spend on agent systems goes into retries that produce nothing usable, and that figure is dominated by long doomed branches, not isolated tool errors.
The Loop's Primitives Were Borrowed From Request/Response
The shape of most agent harnesses comes from a place that doesn't have this problem. A traditional service handles a request, computes a response, and exits. Failure is local: a function throws, a caller catches, a retry is a loop around a single operation, and the unit of work is small enough that "abandon" and "fail" are the same thing. Those are the primitives that got carried into agent loops because they were the ones the engineers writing the harnesses already knew.
Agent work is shaped differently. The unit is a plan that decomposes into a sequence of tool calls and reasoning steps, and the question that gets asked at every iteration of the loop is not "did the last operation succeed?" — it is "is the trajectory I am on still likely to reach the goal?" Those are different questions and they need different control-flow answers. A tool error means the last call failed; a strategy failure means the last seven calls succeeded but they were the wrong calls.
When the only available primitives are continue, return, retry, and budget-exhaustion, the planner has nowhere to put the realization that the strategy is wrong. So the realization either doesn't happen — the model keeps refining a plan it should have thrown out — or it happens and gets squeezed into one of the available shapes, usually a retry that doesn't actually change anything because the underlying plan is still the wrong one.
What the Primitive Has to Specify
"Add an abandon path" is not a feature you can ship by sticking another branch in the model's tool list. The primitive is a contract, and the contract has to answer four questions that most teams never explicitly decide.
What state is preserved. When the planner abandons, what carries forward? The conversation history? The intermediate observations the agent gathered? The partial outputs? A doomed plan to "find the customer's order in system A then update it in system B" probably learned that the customer's account ID is wrong — that's information the next plan needs. But it might also have written half a record into a side-effecting tool, and that's information the next plan must not treat as already done.
What gets rolled back. Tool calls with side effects need a story. Either every effectful tool is idempotent enough that re-execution is safe (a strong claim that almost no real toolset meets), or the harness has to track what was committed and what to compensate, or the abandon path itself has to be defined as "preserve all side effects, only discard the planner's own reasoning state." Each of those is a different product, and the team that doesn't pick one ends up shipping the third by accident.
What gets reported to the user. Abandon is a UX primitive as much as a control-flow one. "I tried approach X, it didn't work, switching to approach Y" is a trust-building moment that almost no agent surfaces. Most surface the failure as if it were terminal, or hide the abandon and present the second plan's output as if it had been the first plan all along. The first wastes the user's information about what was tried; the second is dishonest in a way that erodes trust the moment the user notices.
Whether the new plan starts in the same context or a fresh one. This is the hardest of the four. Continuing in the same context preserves what the agent learned and risks the model anchoring on the failed strategy — recent failures are sticky, and the planner that just abandoned plan A will keep proposing minor variations of plan A. Starting fresh forces a real reframe but loses the diagnostic information that motivated the abandon in the first place. The right answer is usually a structured handoff — a summary of what was tried and why it failed, in a fresh context — but most harnesses don't have a primitive for that either.
The Eval That Catches Its Absence
Standard agent evals don't catch the missing abandon primitive because they are built around tasks where the obvious first plan is also the right plan. The model proposes a sequence, the harness executes it, the answer comes out, the eval scores it. Whether the loop has an abandon path never gets exercised because the loop never needs one.
The eval that catches the gap is constructed deliberately. Take a task and engineer it so the most natural-sounding first plan is wrong. The user wants the most recent invoice, but "the most recent" turns out to mean "the most recent paid invoice" and the agent has to discover that the unpaid one at the top of the list is irrelevant. Or the user wants a refund processed, but the actual blocker is upstream — the order was never charged, so there is nothing to refund, and any plan that starts with "find the charge record" is doomed before it issues a tool call.
Score the agent on whether it ever reaches the right end state, and you'll find a gap between models that converge on a doomed plan and burn through their budget refining it, and models that pivot. The pivoting happens through different paths in different harnesses — sometimes the model emits a "let me reconsider" message that the harness is smart enough to treat as a signal, sometimes the planner gets so confused it accidentally restarts. But the fact that pivoting works through accident rather than design is exactly the point. A primitive turns an emergent behavior into a contract you can rely on.
The cost frame in this eval is clarifying. Compare the average cost of the agent's worst 10% of trajectories to the cost of its best plan plus a fresh attempt. A 12-step doomed branch frequently costs more than a 3-step failed plan plus a 4-step second plan that works. The wasted steps in the doomed branch aren't "exploration" — they are local optimization on an objective that nobody can reach.
The Confidence Signal That Drives the Decision
The hardest part is not adding the primitive — it is deciding when to fire it. "Abandon when confidence drops below threshold X" is the obvious framing, but it puts the decision in exactly the wrong place: the model that is currently invested in a plan is the worst-positioned actor to evaluate whether that plan is hopeless. The same prompt-sensitivity that makes the model commit to a plan in the first place makes it commit harder when challenged.
Better signals come from outside the planner's own self-evaluation. A no-progress detector that flags when consecutive observations are returning the same information the agent already had — not as a hard stop but as evidence that the current plan is not generating new state. A goal-gap detector that compares the agent's current internal state to the requested end state and asks whether the gap is shrinking with each step. A loop detector that flags repeated tool calls with similar arguments. These are observations the harness can make about the planner's behavior, and they are more honest than asking the planner to grade itself.
A common pattern in production systems is to treat low confidence as a propose rather than a commit: when any of those external signals fires, the harness asks the planner to articulate, in one or two sentences, whether the current strategy is still the right one. The planner's answer becomes evidence the harness can incorporate alongside its own signals. The decision to abandon is made by the harness, not the model, but with the model's reasoning included.
The threshold is product-specific. Coding agents that can verify their work cheaply (run the test suite) can tolerate longer doomed branches because the verification is the abandon signal. Customer-support agents that pay a high friction cost for every clarifying question to the user need to abandon earlier and silently. There is no universal number; the team has to pick one based on the cost of a wasted step versus the cost of a premature abandon, and they have to instrument the system well enough to tune it.
What Production Telemetry Should Track
If the abandon primitive becomes part of the loop, it has to become part of the telemetry too. The metrics most teams already track — task success rate, average steps per task, cost per task — won't tell you whether your abandon logic is doing the right thing.
The metrics that do tell you:
- Abandon rate per task type. Tasks that abandon at very different rates than peers either have a planning problem (the first plan is reliably wrong) or a definition problem (the task is harder than the eval set captured).
- Post-abandon success rate. What fraction of abandons lead to a working second plan? If it's high, the primitive is paying for itself. If it's low, the agent is just adding cost without adding solutions, and the threshold is too aggressive.
- Trajectory length distribution. If the longest 5% of trajectories are dominated by tasks that did succeed eventually, abandon is too conservative. If they are dominated by tasks that failed, abandon is too conservative in a different way — the agent should have given up sooner.
- Cost-of-doomed vs. cost-of-pivot. The whole reason the primitive exists. Track it directly.
Without this telemetry, the primitive is an architectural opinion that nobody can validate. With it, the primitive becomes tunable — a knob the team can adjust as the model evolves, the toolset grows, and the task distribution shifts. That tunability is the whole point. Hard-coded thresholds in agent loops are technical debt because the right threshold next quarter is not the right threshold this quarter.
The Architectural Shift
The deeper reason the abandon primitive matters is what it implies about the shape of agent loops. The continue/return/retry vocabulary treats a plan as a thing the model is doing, where each step succeeds or fails locally. The abandon primitive treats a plan as a thing the model is betting on, where the question at every iteration is "does the bet still look good given what I now know?" The first vocabulary borrows from procedural programming. The second borrows from search and planning, which is the body of work that has actually thought about when to give up on a branch.
Frameworks that ship continue, return, and retry are shipping the wrong abstraction for the work agents do. The primitive most agent loops are missing is the one that admits the current trajectory is wrong without admitting the whole task is impossible. Until that primitive exists as a first-class part of the harness — with a defined contract for state, side effects, user-facing messaging, and context handoff — every agent will keep refining its way down doomed branches, and every team will keep diagnosing the symptom as a model problem when the actual problem is the loop they put the model inside.
- https://www.langchain.com/blog/the-anatomy-of-an-agent-harness
- https://www.anthropic.com/engineering/harness-design-long-running-apps
- https://www.langchain.com/conceptual-guides/runtime-behind-production-deep-agents
- https://galileo.ai/blog/ai-agent-cost-optimization-observability
- https://dev.to/gantz/why-agents-get-stuck-in-loops-and-how-to-prevent-it-nob
- https://relayplane.com/blog/agent-runaway-costs-2026
