Task Completion Goes Green While Users Quietly Suffer
Your agent dashboard says 94% task completion. Leadership is happy. The roadmap gets funded. And yet support tickets are climbing, power users have gone quiet, and the one engineer who actually watches traces keeps muttering that something is wrong. Both things are true at once. The agent is completing tasks. It is also taking twelve minutes and four thousand tokens to do a two-step job, backtracking three times, and asking the user to confirm a fact it could have inferred from the first message.
Task completion is a binary that hides a distribution. "The agent finished" tells you nothing about the path it took to finish, and the path is most of what users actually experience. A completion-rate dashboard is structurally incapable of seeing a slow, expensive, annoying agent. It will stay green right up until users churn.
This is not a measurement gap you can patch with a better prompt. It is a category error in what you chose to measure. Completion is the easiest thing to instrument and the least of what people are paying for.
