The Postmortem Where the Root Cause Was a Prompt Nobody Owned
The incident review went smoothly right up until the question that nobody could answer. Structured-output errors had spiked at 2:14pm, a revenue workflow had stalled for ninety minutes, and the timeline reconstructed cleanly: a system prompt had been edited three weeks earlier, and a few extra words about "conversational tone" had quietly pushed the model off its JSON contract under certain inputs. The fix was a one-line revert. The hard part came next. Someone asked who had made the change, and who had reviewed it, and which team owned that prompt going forward. The room went quiet. There was no pull request. There was no reviewer. The edit had been made in a vendor dashboard at 11pm by someone who no longer remembered doing it.
That silence is the actual incident. The JSON contract breaking was a symptom. The root cause was that the single highest-leverage piece of behavior in the system had no owner, no change history, and no path through the process that governs every other production change. The model didn't fail. The model did exactly what it was told. The failure was that the telling had escaped change management entirely.
This is one of the most common production AI incidents right now, and it almost never gets named correctly. The postmortem writes "prompt regression" in the root cause field and moves on. But "prompt regression" describes the code. The real root cause is an org chart with a hole in it.
