Skip to main content

The Agent Paged Me at 3 AM: Blast-Radius Policy for Tools That Reach Humans

· 12 min read
Tian Pan
Software Engineer

The first time an agent pages your on-call four times in an hour because it's looping on a malformed alert signal, leadership learns something the security team already knew: "tool access" and "ability to create human work" were the same permission, and you granted it without either a safety review or a product-ownership review. Nobody owned the question of who's allowed to interrupt a human at 3 AM, because nobody framed it as a question. It was framed as a Slack integration.

The 2026 agent stack has made this failure mode cheap to reach. Anthropic's MCP servers, OpenAI's Agents SDK, and the whole class of vendor-shipped action tools have collapsed the distance between "the model decided to do a thing" and "a human got woken up." Most teams ship those integrations the same way they ship a database client: scope a token, drop in the SDK, write a system prompt, ship. The blast radius of a database client is a row count. The blast radius of a PagerDuty client is a person's sleep.

The Permission Nobody Audited

Look at how a human-facing tool actually gets installed. An engineer wires up the PagerDuty MCP server, gives the agent a service token with incidents:create, and tests it on a staging incident. It works. Ship.

What was reviewed: the OAuth scope, the API contract, maybe a rate-limit setting on the PagerDuty side. What was not reviewed: under what conditions the agent should decide to invoke that tool, who bears the consequence when it does, how often it's allowed to, and what counts as "the agent should never do this autonomously." Those questions weren't on the integration checklist because they don't fit in a checklist. They're product-ownership questions wearing the uniform of an infra config.

Most agent frameworks today have two modes for any tool: it's available, or it isn't. There is no middle ground between "the model has full access to call this thing on its own" and "the model can't see it." That binary is fine for a SQL read replica. It is the wrong shape entirely for a tool whose downstream effect is a human being.

The asymmetry shows up clearly when you list the permission grants by their human cost. Read a S3 bucket: zero. Write a database row: low, reversible. Post in #general: hundreds of human-attention-seconds. Page the on-call: a person is now awake. File a JIRA: someone's backlog grew. Send an email to the customer: a relationship moved. The integration token treats them as equivalent. The org chart does not.

What "Blast Radius" Means When the Blast Is People

Infrastructure people have been thinking about blast radius for a decade — the unit is "how much of my system breaks." Account isolation, region partitioning, kill-switches, percentage rollouts. The concept maps cleanly to compute and storage because the thing being affected is fungible and re-creatable.

Human-facing tools break that model. The unit isn't a container or a row, it's a person's attention. People are not fungible. A page at 3 AM costs the on-call disproportionately more than a Slack message at 2 PM, and a single false page erodes trust in the alerting system in a way that takes weeks to rebuild. There's no rollback for "we paged you and it was nothing."

So the dimensions you actually need to track for a human-reach tool are different from the infra ones:

  • Reach: how many people, and which ones (one on-call vs. a #general channel of 400)
  • Interruption tier: passive (channel post), active (DM), urgent (page)
  • Reversibility: can the side effect be undone, or has the human already context-switched
  • Cumulative pressure: this action plus the last six the agent took on the same person
  • Trust cost: false-positive rate of this class of agent action, not the overall agent

Cumulative pressure is the one teams miss. A single agent action looks defensible in isolation. Three agent actions in a row, against the same human, in the same hour, look like harassment — even if each one was individually correct. The policy layer has to track the recipient as a stateful resource, not the action as a stateless event.

The Recursive Pager: Three Incident Patterns Worth Naming

Postmortems from the last year show three recurring failure modes. None of them are about model quality. They're about the lack of a control plane between "the model produced a tool call" and "a human was contacted."

The amplified false positive. An upstream alert fires for a flaky integration. The agent is wired to investigate, summarize, and notify. Its summary lands as a P1-shaped Slack message in the incident channel. Three downstream agents subscribed to that channel each interpret the summary as a real signal and each notify their own owners. One alert becomes seventeen, and the noise crowds out the actual problem.

The recursive page. An agent pages the on-call about a degraded tool. The on-call's response (or absence of response) is itself a metric the agent watches. After five minutes of "no acknowledgment," the agent re-pages — except this time it adds the on-call's manager. Twenty minutes later it has paged a small org chart, none of whom can fix what was originally a transient blip in a downstream API.

The wrong-channel blast. The agent's slack_post tool was scoped to "all channels the bot is in" because that was the easy default. A formatting bug causes it to post a debug trace into #all-engineering instead of the dev channel it intended. The post says "ERROR" three times in bold. Four hundred engineers see it before it's deleted. The trust hit on the agent's outputs is permanent in a way that a single quiet bug fix could not have caused.

What ties them together: in every case the agent did the thing it was technically allowed to do, and the thing was wrong because the policy didn't model human cost.

A Policy Layer That Sits at the Tool Boundary

The fix is not "make the model smarter about when to page." Models are not the right place to enforce a quota, and "the model should know better" is not an availability argument you can take to the post-incident review. The fix is a policy layer that sits between the agent and any tool that reaches a human, enforced by the runtime, not by the prompt.

Five things that layer needs to do:

  • Per-channel rate limits, enforced at the boundary. The agent gets N pages per service per hour, M Slack DMs per recipient per day, K JIRA tickets per project per week. These are quotas on the tool, not on the model. When the quota is exhausted, the tool call returns a structured error and the agent has to route to a human-approved escalation path or wait. The agent should see the rate-limit response in-context so it can plan around it, not retry it into oblivion.
  • Dry-run mode for any tool that produces a notification. A first-class flag that lets the agent "draft" a page or a Slack post and surface it to a human approver instead of sending. Dry-run is also the right mode for the agent's own reflection step: it can inspect the rendered side effect before committing.
  • Human-in-the-loop on first-of-kind escalations. Anything the agent has never done before to this recipient, or never done at this severity, requires a human to approve the first instance. The system learns the pattern over a small number of approvals; the human is in the loop while the trust is being built, and out of it once the pattern is boring.
  • Per-tenant blast quotas. A daily cap on the total number of human-facing actions an agent can take on behalf of a tenant — independent of which tool is being used. This is the circuit breaker that catches a runaway loop even when no individual tool's per-channel limit was hit.
  • Recipient as a stateful resource. The control plane tracks "actions the agent has taken against this human in the last 24 hours" and surfaces it as input to the next decision. The policy can then say things like: no two pages to the same on-call within the same incident without a human approver, or no Slack DM to a recipient who has muted the bot in the last week.

The trick is that all of these belong in the runtime, not the prompt. Prompts are evidence, not enforcement. Anything you ask the model to "remember not to do" will be forgotten the moment the context window is interesting.

Who Approves Giving an Agent the Ability to Interrupt a Human

This is the conversation nobody wants to have, because the answer changes who pays the bill. Right now, granting an agent PagerDuty access tends to be an infra decision, made under an SRE-shaped budget, with the integration approved by whoever owns the API key.

But the consequences land on a different team. The on-call who got paged is on the product team. The customer who got the wrong-channel email is the GTM team's relationship. The engineer whose JIRA backlog grew is on a roadmap they own. None of those people were in the room when the integration was scoped.

Treat the right to interrupt a human as a product permission, not an infra permission. That means:

  • The owner of the human-facing channel (the on-call schedule, the Slack workspace, the customer email domain) has to sign off on which agents can reach into it, with what frequency, at what severity.
  • The approval is tier-based: passive vs. active vs. urgent, with separate budgets for each.
  • There is a documented revocation path. When the agent's judgment turns out to be worse than expected, the owner can pull the permission without needing infra to roll a token. The control plane has to support that revocation in seconds, not in a weekly deploy.
  • The audit trail is human-readable. "Agent X paged on-call Y at time Z because tool Q returned R" should be queryable by the affected human, not buried in a structured-log dashboard only the platform team knows how to read.

If you would not give a brand-new intern the PagerDuty escalation policy on day one — and you wouldn't — then giving an agent the same access without the same scaffolding is a strictly worse decision. The intern at least has a manager who feels the cost.

Designing the Off-Ramp

Every human-facing tool needs an off-ramp built in from day one. The off-ramp is the thing the agent does instead of contacting a human when the situation crosses a policy threshold. Common shapes:

  • A queue of pending actions that a human triages on their own time, instead of synchronously paging.
  • A "soft" channel — a low-attention place the agent can write to with full freedom, and a "hard" channel that requires a separate budget. The agent learns to default to soft.
  • A drafted message rendered into a UI for a human to send, instead of being sent directly.
  • A delay window. The agent's notification doesn't fire for ten minutes; if a follow-up action by the agent would have made the notification redundant, the original is suppressed.

The off-ramp is what makes the rate limit recoverable. Without it, hitting the quota means the agent silently fails to communicate something the human might genuinely need to know. With it, the quota just changes the channel — from urgent to non-urgent, from synchronous to asynchronous, from "wakes you up" to "shows up in the morning queue."

The Question to Ask Before Wiring Up the Next Tool

Before installing the next MCP server, integration, or webhook that lets your agent reach a human, walk through five questions, and don't ship until you have an answer to each:

  1. Which humans can this tool contact, and who owns their attention budget?
  2. What's the worst single message the agent could send, and what's its blast radius?
  3. What's the worst hour of repeated actions the agent could produce, and at what point does the runtime intervene without the agent's cooperation?
  4. Who can revoke the permission in under sixty seconds when the agent gets it wrong?
  5. What does the agent do instead when the policy says it can't take this action?

If any of those have an answer of "we'd figure it out," the integration isn't ready. The reason isn't theoretical risk. It's that the postmortem you're going to write is already mostly drafted, and it's going to be addressed to a person who didn't sign up to be on the agent's distribution list.

The most important shift in 2026 is not that agents got more capable — it's that they got reach. Capability without a control plane on reach is the next class of incidents your product is going to ship. The good news is that the control plane is mostly mundane infrastructure: rate limits, quotas, approvals, audit trails, off-ramps. The hard part is admitting that the right to wake up a human is a permission, and putting someone in charge of granting it.

References:Let's stay in touch and Follow me for more thoughts and updates