From a Bug to a Behavior Rate: The AI Postmortem Without a Reproducer
A user files a ticket. The agent told a paying customer their refund would be processed in seven hours when the documented SLA is seven days. Screenshot attached. You pull the trace, find the exact prompt, the exact tool calls, the exact model and seed. You replay it. The model says seven days. You replay it again. Seven days. You replay it a hundred times. It says seven days ninety-eight times and "by end of day" twice, and never once says seven hours. The screenshot is unambiguous. The replay disagrees. The postmortem due Friday now has a "Root Cause" section and no root cause to put in it.
This is the shape of most AI incidents that reach a postmortem. Not the obvious outages — those have stack traces and 500-rate graphs and recover the way every SRE has been trained to expect. The hard ones are the single bad output that left a victim, erased its own conditions on the way out, and refuses to come back when you summon it. Every postmortem template you have ever used assumes a reproducer. Agents do not give you one.
