Skip to main content

Agent Blast Radius: Bounding Worst-Case Impact Before Your Agent Misfires in Production

· 10 min read
Tian Pan
Software Engineer

Nine seconds. That's how long it took a Cursor AI agent to delete an entire production database, including all volume-level backups, while attempting to fix a credential mismatch. The agent had deletion permissions it never needed for any legitimate task. The blast radius was total because nobody had bounded it before deployment.

This isn't a story about model failure. It's a story about permission scope. The model did exactly what it calculated it should do. The engineering team just never asked: what's the worst this agent could do if it reasons incorrectly?

That question — answered systematically before deployment — is blast radius analysis.

What Blast Radius Means for Agents

In infrastructure, blast radius describes how far a failure can propagate. A misconfigured load balancer can take down one service or every service behind it, depending on its permissions and dependencies. You contain blast radius by scoping access, adding circuit breakers, and designing failure domains.

Agents introduce a new dimension: reasoning-driven escalation. A misconfigured load balancer fails mechanically. A misconfigured agent fails through faulty inference — and faulty inference is harder to bound. An agent that can read from a database and write to one will sometimes use the write permission when it meant to use the read permission. An agent with shell access and filesystem access can combine those tools in ways nobody anticipated. An agent reasoning about a credential mismatch may decide that deletion resolves the ambiguity.

Traditional SRE blast radius analysis asks: what other systems fail if this component fails? Agent blast radius analysis asks: what's the worst action this agent can take, given the tools and permissions it currently holds? The first question is about mechanical propagation. The second is about permission surface and the space of reachable states an agent can reason its way into.

Completing this analysis before launch is not a security exercise. It's an engineering exercise. And the teams that do it consistently ship smaller failure surfaces than the teams that discover limits through incidents.

The Permission Surface Audit

The starting point is a complete inventory of what the agent can actually do — not what the product spec says it does, but what tools it has access to and what each tool can execute at maximum scope.

For each tool in the agent's toolkit, ask three questions:

What's the most damaging thing this tool can do if called with wrong parameters? A "search documents" tool that accepts a wildcard might return the entire document store. A "send notification" tool might have no rate limit. A "manage calendar" tool might have delete rights in addition to read rights. You don't need to assume the agent will call these tools maliciously — you need to assume it will call them incorrectly under edge conditions, because it will.

Is this tool's worst case reversible? Reading the wrong records is recoverable. Deleting them is not. Sending the wrong email is not. Publishing to an external service is not. Charging a credit card is not. Classify each tool as reversible or irreversible, because that distinction determines whether a failure is an incident or a disaster.

Does this agent actually need this tool? The most effective blast-radius reduction isn't better guardrails — it's removing permissions the agent doesn't need. An agent built to summarize support tickets doesn't need file-system write access. An agent built to draft emails doesn't need send permissions until it's in a supervised deployment phase. The standard framing from OWASP's Excessive Agency category (now a named risk in the LLM Top 10) is that agents routinely receive more permissions than necessary because it's easier to grant access than to scope it carefully.

Document the results as a tool permission matrix: tool name, worst-case action, reversibility, necessity. Anything marked "irreversible" and "not clearly necessary" should be removed before the agent ships.

The Risk Classification Matrix

After the audit, the remaining tools need to be classified by the level of control they require at runtime. The industry has converged on a four-tier structure:

Automatic (green). Read-only queries, internal lookups, logging, notifications to the operating user. These have bounded blast radius and can execute without approval.

Async approval (yellow). Read-write operations on non-critical internal data. The agent can proceed, but the action is logged with enough context for a human to review and reverse within hours. Approval is not required in-flight but the audit trail is mandatory.

Real-time gate (orange). External API calls, any write to a production database, financial operations, cross-system state changes. The agent prepares the action and presents it for human confirmation before execution. Dry-run mode — where the agent explains what it would do without doing it — is the key pattern here. The human reviews the plan, not the aftermath.

Hard disable (red). Destructive operations on production systems, credential management, anything that writes to audit logs, actions that require secondary authorization under your compliance framework. Either blocked entirely at the infrastructure layer, or requiring multi-party approval before the agent can proceed.

The common failure mode is treating this matrix as a policy document rather than an enforcement plan. A system prompt that says "always ask before deleting" does not prevent deletion — it creates a reasoning expectation that the model may not honor under unusual inputs. The orange and red tiers need to be enforced at the harness layer, not the model layer. That means your agent infrastructure intercepts the tool call before execution, validates it against the risk tier, and either proceeds, gates, or blocks — independent of what the model decided.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates