When 'Can the Agent Do X?' Becomes a Ship Commitment
An engineer spends an afternoon poking at a question: can the agent reconcile a customer's invoice against their contract terms? They wire up a quick prompt, run it on five real invoices, and three come back correct. The other two are wrong in ways they don't fully characterize — they close the laptop and move on. In standup the next morning they say "yeah, invoice reconciliation basically works." A PM in the room writes it down. Two weeks later it's a line item on the Q3 roadmap. A month after that, a sales rep promises it to an enterprise account in a renewal call.
Nobody lied. Nobody made a bad decision in isolation. But the team is now contractually committed to a behavior whose eval set does not exist, whose failure modes were never written down, and whose reliability budget was set by a director who saw a demo and interpreted it as a contract. This is the most common way AI features acquire scope: not through a planning meeting, but through a capability probe that nobody ever explicitly promoted.
The industry has a name for the downstream symptom — "POC purgatory," the state where 70 to 80 percent of AI initiatives stall between a working sandbox and a shippable product. But purgatory is the wrong metaphor, because it implies the projects are stuck. They aren't stuck. They're moving — they were committed before anyone checked whether they were ready, and now the team is trying to retrofit reliability onto a promise.
