Agent Circuit Breakers: Why Step Budgets Are Fuses, Not Breakers

May 13, 2026 · 12 min read

Software Engineer

Every team that ships agents to production eventually wakes up to the same kind of incident. An agent enters a state it cannot exit. It re-calls the same tool with cosmetically different arguments for six hours. It oscillates between two plans whose preconditions reject each other. It retries a transient 429 every two hundred milliseconds until morning. It generates a million-token plan it never executes. By the time anyone notices, the token bill is four figures, the downstream API is rate-limited, the customer's session has timed out twelve times, and the on-call engineer is being paged by three different alerts about the same root cause.

The first fix every team reaches for is a step-count budget. Cap the agent at twenty iterations. Cap it at fifty. Pick a number and ship. The step budget makes the incident reports stop, but it does not make the underlying problem go away — and once you understand the mechanism, you can see why a step budget is the agent equivalent of a household fuse: it blows after the damage has been done, the fuse box itself is now a maintenance burden, and the next time something melts, your reflex is to swap in a higher-rated fuse rather than ask what is actually shorting.

Agents need circuit breakers, not fuses. The distinction is older than the AI industry: in distributed-systems literature, the circuit breaker pattern wraps a protected call, monitors for failure signals, and trips proactively before downstream collapse cascades back upstream. Half-open states let the system probe whether the underlying problem has resolved. Threshold tuning is a first-class concern. None of this exists in a step-count cap. A step budget answers exactly one question — "has the agent done too many things?" — and it answers that question at the worst possible time, namely after the agent has already done all of those things.

What a Step Budget Actually Buys You

A step budget is a useful primitive, but it is the floor, not the ceiling. It guarantees that no single agent invocation can run unboundedly. That is genuinely valuable: without it, a stuck agent runs until something else stops it, which in the worst case is the provider quota at 3 a.m. or the engineer who notices the dashboard. So keep the step budget. The mistake is treating it as the answer.

The reason a step budget is insufficient is that the per-step cost is not constant. A single step that calls a reasoning model with a 200K-token context window is many hundreds of times more expensive than a step that calls a small model on a short prompt. A step budget of fifty is a different cost ceiling for a one-shot retrieval agent than for a deep-research agent. The first instinct, when the agent hits its budget before finishing a hard task, is to raise the budget. The team that doubles the cap to handle a single failed task has just doubled the blast radius of the next runaway. The fuse keeps getting bigger. The wiring stays the same.

The deeper problem is that "too many steps" is a lagging indicator. By the time you have done fifty things that don't make progress, the dollar cost is sunk, the rate limit is hit, the conversation history is gigantic, and the downstream system has logged fifty events it now needs to reconcile. A circuit breaker is supposed to trip before the damage compounds, not after.

Semantic Loop Detection: Catching the Repeat Before the Bill

The single most effective addition to a step budget is semantic loop detection. Hash each tool call along with its arguments and look for repetition in a sliding window. If the agent has called read_file("/etc/config") five times in the last seven steps, something is wrong, and it does not matter whether the step budget is fifty or five hundred — the agent is not making progress.

Implementation is straightforward and shockingly cheap. Maintain a ring buffer of the last N (tool name, normalized arguments) tuples. On each step, compute the hash of the current call and check whether it appears in the buffer above some threshold. Trip the breaker when it does. The argument normalization matters: agents often vary arguments cosmetically — adding a space, retrying with "true" instead of true, swapping the order of keys in a JSON blob — and naive byte-equality misses these. A canonical-form normalization (sorted keys, trimmed whitespace, type-coerced primitives) closes most of the dodges.

Two refinements are worth the trouble. First, make the detection result-aware: if the same call produces a different observable result each time, that is genuinely new information and the agent might be making progress. The pattern you want to catch is same-call-same-result repeated, not same-call alone. Second, distinguish between two failure modes — the agent calling one tool over and over (a stuck-tool loop) and the agent oscillating between two competing tools that each undo the other's work (an oscillation loop). The first shows up as repetition in the ring buffer; the second shows up as a short cycle of length two or three. Both are detectable with the same primitive.

When the breaker trips on a semantic loop, the default action should not be to terminate. The default should be to inject a corrective system message — "you have called this tool five times with the same arguments and the same result; pick a different approach or report failure" — and give the agent one more step to adapt. If it persists in the loop after the warning, then terminate. The two-tier escalation cuts false positives dramatically; an agent that is briefly stuck but recoverable is the common case, and a hard kill on the first detection is almost always worse than a nudge.

Progress Signals: Forcing the Planner to Declare a Win Condition

Loop detection catches the dumb cases. The more pernicious failure mode is the agent that does not loop, exactly — it just wanders. Each step is novel, each tool call has different arguments, but the agent is not converging on a result. It is exploring branches of a search space whose terminal node has receded since step one. This is the agent that burns three thousand steps producing a "plan" instead of an answer.

The fix is a progress signal. Require the planner, every N steps, to articulate what concrete progress it has made and what the remaining work is. If it cannot, the breaker opens. The signal can be as simple as asking the agent to fill in a JSON object with fields like claims_so_far, next_subgoal, estimated_remaining_steps. The format does not matter as much as the forcing function: the agent has to commit to a measurable state, and a separate process can check whether the state is advancing.

A robust progress signal is not just "did the agent answer the question?" — that is a binary at the end of the run, useless for in-flight intervention. It is a state vector that should monotonically decrease in some direction. Estimated remaining steps should trend down. The set of unresolved subgoals should shrink. The number of distinct tools the agent thinks it still needs to call should drop. A planner that reports the same next_subgoal three checkpoints in a row is stuck, even if its tool calls look superficially varied.

Token-Velocity Ceilings: Pricing the Loop Per Minute, Not Per Step

Step budgets count steps. Cost dashboards count dollars per request. Neither catches the agent that is silently spending a hundred dollars a minute. What you want is a velocity metric — a derivative, not a level. Per-minute token burn, per-minute dollar burn, per-minute tool-call count. Throttle when the velocity exceeds a band that you have calibrated for the agent's normal workload.

The velocity ceiling is qualitatively different from a budget. A budget says "this run can cost at most $X" — you find out at the end. A velocity ceiling says "this run cannot cost more than$ X per minute" — and you find out within a minute. For an agent whose normal request costs ten cents, a sudden run at five dollars per minute is anomalous regardless of whether it eventually completes within the per-run budget. The breaker should trip on the rate, not the cumulative.

Tuning is a function of the workload. For a customer-facing chat agent, normal velocity is bounded by the user typing speed and the response cadence; a spike to twenty calls per minute is almost always a stuck loop. For a batch-processing agent that runs unattended, the steady state is high and the spike is a percentage above baseline rather than an absolute number. The same primitive works for both, but the threshold is workload-specific and needs an actual measurement period before you set it.

Tool-Call Distribution Alarms: When the Mix is the Tell

A different signal worth instrumenting is the distribution of tool calls within a single run, compared against the baseline distribution across all runs. If the typical agent run calls read_file three times and web_search once, and the current run has called read_file two hundred times with no web_search, the mix is the alarm. The total step count might still be within budget. The semantic loop detector might not have fired yet because the file paths are varied. But the agent is doing something qualitatively different from what a healthy agent does, and that is detectable from the histogram alone.

This kind of distribution alarm is a chi-squared test against the prior. You don't need anything sophisticated — a sliding-window count by tool name compared against a published baseline is enough. The alarm should be a soft signal, not a hard kill: the agent might legitimately be on a path that needs three hundred file reads. But it should be a flag the on-call engineer sees in the dashboard before the run blows through its hard limits.

Halt-with-Handoff: The Failure Mode That Beats Termination

A circuit breaker that opens by killing the agent and returning an error is the simplest implementation and often the wrong one. The agent has expensive context — partial reasoning, intermediate artifacts, a half-explored plan — and discarding all of it because one heuristic tripped is wasteful and often user-hostile. The customer wanted an answer; they get a stack trace.

The better default for a tripped breaker is a handoff. Open the circuit, freeze the agent state, and route the partial context to a human reviewer (or, in some architectures, a different agent — a more capable model or a specialist tool). The handoff preserves the work, gives the customer an answer that acknowledges the difficulty rather than denying service, and produces a labeled example for the eval set. The team learns from every breaker trip; with a terminate-on-trip policy, the team just sees a counter increment.

The handoff pattern also changes the political economics of the breaker. A team that knows tripping the breaker means a human reviewer gets paged is reluctant to set the threshold too low — every false positive is a wasted reviewer minute. A team that knows tripping the breaker means an angry user gets a generic error message is also reluctant to set it low, but for the wrong reason — they are protecting the metric, not the user. The handoff path makes the breaker honest because the cost of tripping is visible inside the team, not externalized to the user.

The Architectural Takeaway

The discipline that has to land in any production agent stack is a layered breaker pattern: a step budget as the absolute fail-safe, a semantic loop detector with two-tier escalation as the most common defense, a progress signal that forces the planner to declare a win condition, a token-velocity ceiling that catches the cost-burn case, a tool-call distribution alarm as the canary, and a halt-with-handoff path that preserves the work when any of the above trip.

The cost-control framing is the one most teams underweight. An agent's bug surface is a multiplicative product of step count and per-step cost. A bug that runs ten times longer than it should at a per-step cost ten times higher than normal is a hundred-times incident, not a ten-times incident. Circuit breakers are not just a safety primitive; they are a cost primitive. The fact that they happen to also stop runaway customer-trust damage is a bonus the financial controls would justify alone.

Distributed systems learned this lesson in the 2010s, after enough cascading-failure incidents made it boring. Agents are at the equivalent moment now — the failure modes are visible, the patterns are well-understood, and the only remaining question is whether your team installs the breakers before or after the production incident. A step-count budget is a fuse. It blows once, it tells you the wire was overloaded, and then you replace it with a bigger one. A circuit breaker is a control surface. It tells you which protection tripped, why, and what to change. The first is what you ship before the incident. The second is what you ship to make sure there isn't a next one.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Agent Circuit Breakers: Why Step Budgets Are Fuses, Not Breakers

What a Step Budget Actually Buys You

Semantic Loop Detection: Catching the Repeat Before the Bill

Progress Signals: Forcing the Planner to Declare a Win Condition

Token-Velocity Ceilings: Pricing the Loop Per Minute, Not Per Step

Tool-Call Distribution Alarms: When the Mix is the Tell

Halt-with-Handoff: The Failure Mode That Beats Termination

The Architectural Takeaway

Recommended Reading

About Tian Pan

What a Step Budget Actually Buys You​

Semantic Loop Detection: Catching the Repeat Before the Bill​

Progress Signals: Forcing the Planner to Declare a Win Condition​

Token-Velocity Ceilings: Pricing the Loop Per Minute, Not Per Step​

Tool-Call Distribution Alarms: When the Mix is the Tell​

Halt-with-Handoff: The Failure Mode That Beats Termination​

The Architectural Takeaway​

Recommended Reading

About Tian Pan

What a Step Budget Actually Buys You

Semantic Loop Detection: Catching the Repeat Before the Bill

Progress Signals: Forcing the Planner to Declare a Win Condition

Token-Velocity Ceilings: Pricing the Loop Per Minute, Not Per Step

Tool-Call Distribution Alarms: When the Mix is the Tell

Halt-with-Handoff: The Failure Mode That Beats Termination

The Architectural Takeaway