Your stop_reason Is Lying: Building the Real Stop Taxonomy Production Triage Needs
The on-call engineer pulls up a trace. The model returned, the span closed clean, the API call shows stop_reason: end_turn. By every signal the platform offers, this was a successful generation. Three minutes later a customer reports that the agent confidently wrote half a config file, declared the operation complete, and moved on. The trace had no warning sign because the warning sign isn't in the API contract — the provider's stop reason has four to seven buckets, and the question your incident demands an answer to lives in the gap between them.
Stop reasons are the field engineers reach for first during triage and the field that lies most cleanly when it does. The values are designed for a runtime that needs to decide what to do next: was this turn complete, did a tool get requested, did a budget get exceeded, did safety intervene. They are not designed for a human reconstructing why an answer went wrong, and the difference between those two purposes is where production teams burn entire afternoons.
