The Demo-to-Production Cliff: Why a 90%-Accurate Agent Ships at 0%
There is a specific kind of meeting that happens about six weeks after an impressive agent demo. The prototype booked the trip, refactored the module, reconciled the invoices — live, on the first try, in front of stakeholders. Everyone agreed it was ready. Then someone pulled the production numbers, and the agent that "worked" was generating a support ticket every forty completed tasks, a refund every few hundred, and a quiet trail of half-finished states nobody could explain. The project did not get killed. It got stuck. It is still stuck.
This is the demo-to-production cliff, and it is the single most reliable way for an agent project to fail. The cliff is not caused by a bad model or a sloppy team. It is caused by a measurement mistake: treating a 90% success rate as 90% of the way to shipping. It is not. A 90%-accurate agent is a triumphant demo and, for most real workflows, an unshippable product. The MIT NANDA report that made headlines in 2025 — 95% of enterprise GenAI pilots delivering no measurable P&L impact — is this cliff, counted at scale.
