The Reply-All That Wasn't: Agent Outbound Fan-Out Hazards
The user asked the agent to "let Karen know we're done." The agent called send_email with the recipient field set to karen-team@, the most plausible address its contact-lookup tool returned. The message — three paragraphs of internal-only project status, including a candid line about a customer's renewal risk — landed in forty inboxes. One of those inboxes belonged to the customer in question. The postmortem ran for two weeks.
There was no prompt injection. There was no model jailbreak. The tool worked exactly as specified. The contract the team wrote for send_email was "send a message to a recipient." The contract the world enforces is "broadcast to a group whose composition the sender did not audit." That gap — between what the tool is named and what the tool can actually do — is where most outbound agent incidents live.
Email is the obvious example, but the same hazard hides in every messaging tool an agent ever touches. The thirty years of muscle memory humans built for these channels did not transfer to the planner pattern-matching its way through a contact list.
The Hazards Your Tool Catalog Hides
Every outbound channel has a fan-out semantics agents do not intuit. The signatures look benign — a single recipient field, a channel ID, a phone number — and the runtime quietly resolves them into anything from one human to a thousand.
Consider what your catalog actually exposes:
- Email recipient fields that accept distribution lists indistinguishably from individuals.
karen@andkaren-team@differ by one token; the planner's confidence in either depends on which one the contact-lookup tool returned first. - BCC defaulting to empty when the user expected the agent to inherit their personal email-client habit of BCC'ing themselves on every send. The agent has no client; the audit trail starts cold.
- Calendar invites that auto-include the chain of forwarding optionals from the original organizer's invite, so an agent scheduling a "follow-up with Sam" silently invites Sam's entire team plus three external advisors who were on the parent meeting.
- Slack channels whose names look like DMs but post to public —
#alex-and-me,#oncall-private,#temp-discussion-2024-q3. The channel ID is opaque; the agent has no signal that the audience is forty people instead of two. - SMS gateways that fan out to group lists when the destination is a short code. The number looks like a number. The route is a broadcast.
- Webhook tools whose target URL is a Zapier or n8n endpoint that fans out to a dozen subscribers nobody told the agent about. The HTTP 200 you get back is the proof of fan-out, not the absence of it.
Each of these is, in the tool spec, a single call. Each of them is, in the world, a megaphone.
Why The Planner Picks The Loudest Door
The planner is doing pattern matching, not human-style audience modeling. When it encounters a contact-lookup result that returns multiple candidates, it ranks them by some combination of token similarity, recency, and frequency in its training data — none of which correlate with "is this an individual or a group."
In fact, distribution lists often score higher. They are mentioned more often in corporate corpora. They have shorter, more memorable handles (engineering@ vs. eric.henderson.iv@). They appear at the top of contact-search results because they are searched more frequently. The planner is doing exactly what it was trained to do, and the result is that the most plausible recipient is the one with the largest blast radius.
The same dynamic plays out across every channel. Slack channels with names like #general outrank specific user IDs. Calendar invites default to the entire room when the recipient is ambiguous. The agent does not know which choice has fan-out semantics because the tool spec did not tell it.
This is a variant of the confused deputy problem, where an agent uses its legitimate privileges in ways that exceed the intent of the request. The agent had permission to send the email. It had permission to use the recipient address. The escalation happened in the gap between "send a message" and "broadcast to forty people," and there is no IAM policy that catches it.
The Discipline That Has To Land
Fixing this requires treating recipient resolution as a first-class concern, not a side effect of a tool call. Four practices, in roughly increasing order of how much they will slow down your agent:
Two-step recipient resolution. Split send_email into propose_send and commit_send. The agent calls the first; a deterministic resolver classifies the target by querying your directory, returns metadata (individual vs. group, group size, audience type, last-active recipient), and only then does the commit phase fire. The resolver is not an LLM. It is a few lines of code that read your address book and refuse to lie about what they found.
Default-deny on group recipients above a size threshold. Pick a number — five is reasonable, ten is generous — and require explicit user confirmation when the resolved recipient list exceeds it. The threshold should differ by sensitivity: external recipients trigger at one, internal at ten, public channels never auto-fire. The agent can include the proposed recipient list in the confirmation prompt; the human reviews and approves.
Outbound dry-run preview. For any message reaching N+ humans, render the message and the resolved recipient list to the user before send. This is not a "loading spinner" — it is a full preview, with the actual To/Cc/Bcc populated. The friction is the point. Studies on human-in-the-loop oversight find that organizations sustaining 10–15% escalation rates ship reliable agents; under 1% means the agent is over-trusted, over 25% means the agent is not actually saving work.
Fan-out budgets. Per session and per user, set a budget on total recipients across all outbound calls and refuse — loudly — when the agent exceeds it. A bug that would otherwise spam a thousand inboxes fails after the first hundred. The budget is a circuit breaker, not a permission system: it does not require the agent to know what it is doing, only that it stops doing too much of it.
These four together cost you a day of engineering and roughly twenty milliseconds per outbound action. They do not require model changes, fine-tuning, or a new prompt-injection defense. They require taking the tool catalog seriously as a security boundary.
Audit What You Intended Versus What Actually Happened
Even with all four mitigations in place, you will ship an incident. The question is whether you can see it.
Every outbound action your agent takes should be logged with two recipient counts: intended (what the agent's reasoning said it was going to do) and actual (what the downstream system received). Divergence between the two is the signal. An agent that says "I emailed one person" while the side-effect log shows forty addresses is the bug; the audit makes that visible without waiting for the customer to call.
This requires capturing both halves of the agent loop — what the model was asked to do, and what the tool actually did with that intent. The LLM request log alone does not show the email landing in the distribution list. The tool invocation log alone does not show what the agent thought it was doing. You need both, joined on the request ID, with an alert that fires when the counts diverge by more than a configurable margin.
Common gaps practitioners hit:
- Logging the tool call but not the resolved parameters — so you see
send_emailwas called, but not whose address it actually went to. - Logging the agent's natural-language reasoning but not parsing it into a structured intent that can be diffed against the tool call.
- Treating the divergence alert as a low-priority dashboard item instead of an incident page. Fan-out incidents are time-sensitive. The recipient list is already in inboxes; every minute of delay before recall is a minute of additional read receipts.
The teams I've seen handle this well treat divergence as a P2 alert with a five-minute SLA, the same posture they use for unauthorized data exfiltration. That is the right comparison: the data is leaving the organization, just through a friendlier-looking channel.
The Architectural Realization
Outbound messaging tools have fan-out semantics that agents do not intuit. Humans developed muscle memory over thirty years — the brief pause before clicking Reply All, the second look at the To field, the muscle reflex of typing the recipient last so you do not accidentally send a half-finished email. None of that transfers. The agent does not have a Reply All button to hesitate over; it has a function call that returns an HTTP 200.
The team that ships an agent with the same tool affordances as a human email client, and none of the human muscle memory, is the team that will ship the incident the muscle memory was built to prevent. The mitigation is not better prompts. It is recognizing that the tool spec is a contract, the contract is incomplete, and the missing clauses are the ones humans absorbed unconsciously over decades of being burned by exactly these channels.
Treat your outbound tool catalog as a set of broadcast primitives until proven otherwise. Make the agent prove the recipient is an individual before you let the message go. Audit the gap between intent and effect, page on divergence, and budget the total reach. Do this before launch, not after the postmortem — because the postmortem is the artifact you write when forty people, including the customer, learn things you did not mean to tell them.
- https://labs.cloudsecurityalliance.org/research/csa-research-note-ai-agent-confused-deputy-prompt-injection/
- https://www.promptfoo.dev/lm-security-db/vuln/agent-confused-deputy-escalation-d1becd4d
- https://labs.reversec.com/posts/2025/08/design-patterns-to-secure-llm-agents-in-action
- https://aws.amazon.com/blogs/machine-learning/implement-human-in-the-loop-confirmation-with-amazon-bedrock-agents/
- https://galileo.ai/blog/human-in-the-loop-agent-oversight
- https://developers.cloudflare.com/agents/concepts/human-in-the-loop/
- https://learn.microsoft.com/en-us/agent-framework/agents/safety
- https://clearfeed.ai/blogs/slack-noise-isnt-just-a-slack-or-an-ai-problem
- https://www.loginradius.com/blog/engineering/auditing-and-logging-ai-agent-activity
- https://tetrate.io/learn/ai/mcp/mcp-audit-logging
- https://medium.com/@Quaxel/audit-logs-that-save-you-in-agent-incidents-8d39b95bea22
