Skip to main content

One post tagged with "staging"

View all tags

The Sandbox Your Agent Didn't Notice Was Real

· 10 min read
Tian Pan
Software Engineer

A team I know has a textbook staging setup. Read-only replicas of the production database. A mock Stripe account that pretends to charge cards. Synthetic users with fake email addresses on a domain nobody owns. The agent is asked to walk through an "account delinquent" escalation flow in staging, end to end, as part of a release rehearsal. The trace looks clean. The agent does what it is supposed to do.

Three minutes later, a real customer — a paying one, who churned six months ago and was still in a dormant export the developer had used to seed a test fixture — replies to a politely-worded payment-overdue email. The "send_email" tool, registered next to a dozen other tools that all terminate in mocks, was wired to the production Mailgun key. The developer who set it up two sprints earlier had been iterating fast on email templates and the sandbox tier capped them at five emails an hour, which broke the inner loop, so they swapped in the real key "just for the afternoon" and forgot. Nobody re-checked. The agent had no way to know.