Your Agent's Outbox Is Your Next Deliverability Incident
The first time it happens, the on-call engineer is staring at a Gmail Postmaster dashboard that has gone solid red, the support inbox is on fire because customer password resets are landing in spam, and the agent that did this is still running. It sent eighty thousand "personalized follow-ups" between 4 a.m. and 9 a.m. local time, all from the company's primary sending domain, all signed with the same DKIM key the billing system uses. By the time anyone notices, the domain reputation that took three years to build is gone, and so are the next six weeks of inbox placement on every transactional message the company depends on.
Sending email from an agent looks like a one-line tool call. send_email(to, subject, body) is the canonical demo, and every framework ships it as a starter integration. But email is not like other tools. A bad database query rolls back. A bad API call returns an error. A bad batch of email lowers the deliverability of every other email your company sends, for weeks, and there is no transaction to roll back because the messages are already in flight to recipient mailservers that are now writing your domain's reputation history.
The discipline that experienced email teams have built up over twenty years — separate sending streams, dedicated subdomains, per-stream DKIM keys, careful warm-up curves, automatic suppression on bounce-and-complaint feedback — was not built for adversarial scenarios. It was built for the reality that email reputation is a slow-moving, organization-wide shared resource. Agent teams discover this the hard way the first time their sending domain hits a Gmail penalty box, and they discover it because nobody asked deliverability whether agents should be allowed to send email at all.
Email Reputation Is Org-Wide and Hard to Rebuild
A useful first principle: email reputation belongs to the domain, not the application. When your marketing platform sends a campaign from acme.com and gets too many spam complaints, your transactional service sending password resets from acme.com pays the price. Inbox providers like Gmail and Yahoo evaluate reputation at the sending domain (and IP) level, and since 2024 they have enforced that bulk senders — anyone sending five thousand or more messages per day to consumer inboxes — keep spam complaint rates under 0.3% with a recommended target of 0.1%. Cross that threshold and messages stop arriving, not just slow down.
Agent fan-out turns a 0.1% complaint rate into a survivable signal or a domain-killing one depending entirely on how you isolated the sending. If your agent sends from the same domain as your password resets, an agent that earns 1% complaints on its eighty-thousand-message morning has just contributed eight hundred complaints to the shared reputation pool. The transactional system that has been running at 0.05% for years is now bundled with that 1% in the eyes of every receiver. The penalty applies to everything sent from the domain.
This is the part that humans-shipping-email naturally got right and that agent teams systematically get wrong. A marketing operations team would never send a cold-outreach campaign from the same subdomain as account verification emails — that mistake is folkloric. Agent teams, optimizing for "let's just call the SendGrid API," ship that mistake on day one because the agent's send tool points at whatever credentials happened to be on hand. The infrastructure was inherited, not designed.
The Architectural Separation Agents Need Before Their First Send
Before any agent gets a send_email permission, the sending infrastructure should already look like this:
- A dedicated subdomain per agent surface. If your support agent emails customers and your sales agent emails prospects, they should not share
mail.acme.com. Usesupport-bot.acme.comandsales-bot.acme.com. When one of them poisons reputation, the blast radius is contained. The transactional subdomain (the one carrying password resets and receipts) should be off-limits to any agent that doesn't own that exact use case. - Distinct DKIM key pairs per subdomain. Never share DKIM keys across services. Each subdomain gets its own selector and its own private key, with two selectors maintained per signing domain so you can rotate without downtime. This sounds like ceremony until the day you need to revoke a compromised agent's signing capability without nuking your entire mail flow. With distinct keys, revocation is a DNS change. With shared keys, it's an outage.
- Per-recipient-domain rate limits enforced at the agent boundary. Inbox providers throttle per sender, but they do it silently — your messages just start landing in spam. Enforce the limits yourself before the provider does. A sales agent that wants to email five hundred prospects at
gmail.comin the same minute should be rate-limited at fifty per minute by your gateway, queued up, and trickled out. The agent doesn't get to opt out. - Bounce-and-complaint feedback loops that suspend the agent before the provider does. Subscribe to the provider's webhooks (SendGrid, Mailgun, SES, Postmark all expose them). When the rolling complaint rate for a given agent crosses a tight internal threshold — say 0.05%, well under the 0.3% Gmail enforcement line — auto-suspend that agent's send permission. The agent finds out from a tool error, not from your CEO.
None of this is novel email engineering. It's the playbook every senior deliverability engineer has internalized. The novelty is that agent teams need to build it before the first agent ships, not after the first incident.
Rate Limits Are an Agent Safety Feature, Not Just a Cost Control
Most discussions of agent rate limiting frame it as a cost or runaway-loop concern. For email-as-tool, it's a deliverability concern with a much shorter feedback loop than the CFO's inference bill.
The mental model that helps: every recipient-domain has its own rate budget, and the budget refills on a timescale measured in hours. Forty messages to gmail.com in five minutes is fine. Four thousand in five minutes is a one-way trip to the spam folder for that sender, possibly that subdomain, possibly the parent domain. The receiver decides, and the decision can be sticky for weeks.
So the agent boundary needs three rate-limit dimensions, not one:
- Per agent run (a single user-initiated session). Cap how many messages a single execution can produce, full stop. If a user asks an agent to "follow up with my whole pipeline," ten thousand sends from one prompt should require explicit human confirmation, not happen silently.
- Per agent identity, rolling. A given agent surface should have an hourly and daily budget. When it's exhausted, queue or refuse. This catches the case where one orchestrator spawns many runs and each individual run looks reasonable.
- Per recipient domain. The hardest one to retrofit, because it requires bookkeeping the agent layer doesn't usually do. But it's the one that protects deliverability — the inbox provider rate-limits per domain whether you participate or not.
- https://emailwarmup.com/blog/gmail-and-yahoo-bulk-sender-requirements/
- https://www.mailgun.com/state-of-email-deliverability/chapter/yahoogle-bulk-senders/
- https://www.suped.com/knowledge/email-deliverability/sender-reputation/should-i-use-subdomains-for-transactional-and-promotional-emails-to-protect-my-main-domain-reput
- https://www.duocircle.com/email-security/10-best-dkim-subdomain-practices-for-better-email-deliverability
- https://www.mailgun.com/blog/deliverability/understanding-dkim-how-it-works/
- https://workos.com/blog/ai-agent-access-control
- https://google.github.io/adk-docs/safety/
- https://4thoughtmarketing.com/articles/can-spam-casl-gdpr-email-compliance
- https://prospeo.io/s/casl-cold-email
- https://www.twilio.com/docs/sendgrid/for-developers/tracking-events/event
- https://mailflowauthority.com/email-deliverability/bounce-rate-thresholds
- https://redsift.com/guides/bulk-email-sender-requirements
