Skip to main content

The Pre-Launch Blast Radius Inventory: The Document Your Agent Team Forgot to Write

· 10 min read
Tian Pan
Software Engineer

The first hour of an agent incident is always the same. Someone notices the agent did something it shouldn't have — invoiced the wrong customer, deleted a calendar event for the CEO, posted a half-finished apology in a public Slack channel — and the response team starts asking questions nobody has written answers to. Which downstream system holds the audit log? Which on-call rotation owns that system? Was the call reversible, and within what window? Who owns the credential the agent used, and does that credential also let it touch other systems we haven't checked yet? The team that wrote the agent rarely owns those answers, because the answers live in the systems the agent calls, and nobody at launch wrote them down in one place.

That document is the blast radius inventory, and it is the artifact most agent teams discover the absence of during their first incident. It is not a security checklist, not a tool schema, not a runbook. It is an enumerated list of every external system the agent can touch and every fact you need on the worst day of that system's life. Teams that ship agents without one are betting that incident-response context can be reconstructed faster than the blast spreads, and that bet keeps losing as agents get more tools and the tools get more powerful.

What goes in the inventory

A blast radius inventory has one row per tool, and each row is filled in as if the tool's worst day has already happened. The columns aren't optional and they aren't aspirational — every cell needs to be real before the tool is registered, because every empty cell becomes a question someone asks at 2 a.m.

A workable starting schema:

  • Tool name and binding: the registered tool name, the agent or agents that can call it, and the function or schema the planner sees.
  • Downstream system: the actual external system the call hits. Not "the CRM API" — salesforce-prod.us-east. Specificity matters because that's the thing that gets paged.
  • Credential: the service account, OAuth scope, API key alias, or signed identity the call presents. If the credential is shared with non-agent callers, that gets noted explicitly.
  • Worst-case effect: what the call does if a prompt injection successfully hijacks it. Not the happy path — the assume-compromise path. "Sends an email from the CEO's address to a list" beats "sends notifications."
  • Reversibility class: reversible (under 60 seconds), reversible-with-effort (manual rollback within an hour), or irreversible (money moved, message delivered, key rotated, file overwritten in a system without versioning).
  • Audit trail location: where the action's record lives — the SaaS app's audit log, an internal data warehouse table, a CloudTrail bucket, the agent's own structured event stream — and the retention window.
  • Downstream on-call: which team gets paged when this tool's downstream system breaks, regardless of whether the agent caused the break.
  • Rate-limit budget: the share of the downstream's quota the agent is allowed to consume, separated from the share other internal callers consume.
  • Composition effect: the worst case when the planner combines this tool with another tool in the catalog. This is the column most teams skip and most postmortems land on.

The composition column is the one that turns the inventory from a tool list into a blast-radius map. A read_email tool is low-risk on its own. A send_email tool is low-risk on its own. The two together with a fetch_url tool form Simon Willison's lethal trifecta — private data, untrusted instructions, and an exfiltration channel — and the agent's planner can compose those three into an effect none of their owners signed up for. The composition column forces the team to write down which combinations have been considered, and which haven't.

Filling in the worst-case column honestly

The temptation is to write what the call does on the happy path. "Updates a lead's status." "Posts a summary to a project channel." That entry is useless, because the only time anyone reads the inventory is when the call did something nobody wanted.

The correct frame is to assume a prompt injection has succeeded and the planner is now hostile. Then ask: with the credential this tool holds, with the scope it's been granted, with the rate limit it's been given, what is the worst thing it can do before something else stops it? The answer is rarely the thing the tool was designed for. A update_lead_status tool with broad write scope can mass-mutate every lead in the CRM if the prompt is constructed to iterate. A post_to_channel tool that doesn't pin to a specific channel ID can post to leadership channels. A fetch_calendar tool that returns full event bodies can exfiltrate meeting notes that contain unannounced acquisition targets.

The 2025-Q4 data is unambiguous on this point: documented prompt-injection attempts against enterprise AI systems rose 340% year-over-year, with successful attacks rising 190%. The CVE-2025-53773 disclosure showed prompt injection in a pull request description achieving remote code execution through a coding agent at CVSS 9.6. The teams that survived these incidents had pre-written answers to the worst-case question. The teams that didn't had to invent the answers while the incident was live.

A useful test: if the worst-case column on a tool reads identically to the tool's description, the column hasn't been filled in. The description tells you what the tool is supposed to do. The worst-case column tells you what the tool can do when nobody is supposed to be using it.

Make the inventory a merge gate, not a launch artifact

The most common failure mode is treating the inventory as a one-time launch artifact. The team writes it the week before launch, ships, and the document goes stale within a quarter. The next prompt-injection incident has the response team paging the wrong on-call while the inventory says the tool was retired six weeks ago.

The fix is structural, not procedural. The tool registry — the code-level definition of which tools the agent has — and the inventory must be the same source of truth. A new tool requires its inventory entry to merge. A scope change on an existing tool requires its inventory entry to be updated in the same PR. The merge gate is enforced in CI, the same way schema migrations or new public endpoints already are. The reviewers on that PR include the AI team, security, and the on-call team for the downstream system the tool calls. If the on-call team for the downstream system is unwilling to sign off, that's a signal that the worst-case column isn't accurate or the rate-limit budget hasn't been negotiated.

This is the shift-left posture that's emerging across enterprise agent governance: the inventory is not a binder somebody updates when they remember to. It is a CI artifact, derived from and validated against the live tool definitions, with cross-team review baked into the PR template. Microsoft's Agent Governance Toolkit, released earlier this year, takes the same position — the only inventory that stays correct is the one the build pipeline refuses to merge without.

The cultural piece that has to come with the structural piece: the inventory is read by people who don't write the agent. Security reads it during incident triage. The downstream system's on-call reads it to understand what their dependent service is being asked to do. Capacity planners for the dependent SaaS read it to forecast load. Audit reads it as the index of which automated effects need attestation. If the inventory only makes sense to the team that wrote it, the team that wrote it is the team that gets paged for everyone else's confusion.

Operational use after launch

The post-launch lifetime of the inventory is where the value compounds, but only if the team uses it as a living document. Three operational patterns matter.

Incident triage starts with the inventory. The first action when an agent does something unintended is to look up the tool that did it and read the row. The on-call name in the row is who joins the bridge. The audit-trail location is where the responder pulls forensic data. The reversibility class tells the bridge whether to focus on rollback or on damage control. None of this requires a human to remember which dashboard owns which system, because the row already says.

Audit reviews use it as the index. Quarterly reviews of agent-driven actions don't try to enumerate from scratch — they walk down the inventory and confirm each row's audit-trail location still has retention, each credential still has the documented scope, each rate-limit budget is still being respected. The inventory's last verified column is the source of truth for which rows have been re-validated this quarter.

Deprecation reviews check inventory accuracy as upstream systems evolve. When a downstream SaaS announces a breaking change to its API, the inventory tells you which tools depend on it. When a downstream team reorganizes its on-call rotation, the inventory's downstream-on-call column gets updated. When a credential is rotated to a narrower scope, the worst-case column gets re-evaluated against the new scope.

The composition column is the one that benefits most from periodic re-evaluation. Each new tool added to the catalog re-opens the question of which tools can now be combined to produce effects no individual row warned about. A team that has been adding tools quarterly without revisiting the composition column has, by the third quarter, an attack surface no row of the inventory describes.

Agents are blast radius amplifiers

The architectural realization underneath all of this: agents compose tools into effects no individual tool's owner anticipated. A tool owner can reason about their tool in isolation and conclude it's safe. They cannot reason about their tool plus four other tools the planner can call in any order, with arbitrary natural-language inputs from arbitrary upstream sources. The composition is where the blast radius lives, and the composition is the thing no individual tool owner is responsible for.

That responsibility lands on the agent team, and the inventory is how the agent team discharges it. The inventory is the only document in the organization where the worst case of every possible composition has been considered and named. The team that doesn't write it isn't avoiding the work — they're outsourcing the work to the postmortem.

If you're shipping an agent that touches more than two external systems and you don't have something that looks like this inventory, the next incident will write it for you, in the wrong order, under time pressure, with the wrong on-call paged. The cheapest version of this document is the one you write before launch. Every version after that is more expensive than the last.

References:Let's stay in touch and Follow me for more thoughts and updates