Why Your Agent Works in Dev and Panics in Prod
The agent demo always works. Three customers in the table, one matching record, twelve documents in the vector index, an empty calendar with infinite open slots. The agent picks the right row, retrieves the right document, books the right meeting. Ship it.
Then production hands the same agent ten million customers with three "John Smith"s in the same city, a filter that returns four thousand rows because the agent confidently wrote status != 'closed' when it meant status = 'active', a vector query that returns seven plausible documents the agent has never had to choose between, and a calendar where every slot is a negotiation. The capability that looked correct in dev is qualitatively different in prod — not slightly worse, not flakier, but solving a different problem the dev environment never made it solve.
This is the gap that "it worked locally" hides. For deterministic code, that phrase is already a lie about edge cases. For agents, it is a stronger lie, because the agent's behavior is a function of input distribution, and the input distribution shifts from "trivial" to "ambiguous" the moment you cross the prod boundary.
