Skip to main content

3 posts tagged with "change-management"

View all tags

The Postmortem Where the Root Cause Was a Prompt Nobody Owned

· 9 min read
Tian Pan
Software Engineer

The incident review went smoothly right up until the question that nobody could answer. Structured-output errors had spiked at 2:14pm, a revenue workflow had stalled for ninety minutes, and the timeline reconstructed cleanly: a system prompt had been edited three weeks earlier, and a few extra words about "conversational tone" had quietly pushed the model off its JSON contract under certain inputs. The fix was a one-line revert. The hard part came next. Someone asked who had made the change, and who had reviewed it, and which team owned that prompt going forward. The room went quiet. There was no pull request. There was no reviewer. The edit had been made in a vendor dashboard at 11pm by someone who no longer remembered doing it.

That silence is the actual incident. The JSON contract breaking was a symptom. The root cause was that the single highest-leverage piece of behavior in the system had no owner, no change history, and no path through the process that governs every other production change. The model didn't fail. The model did exactly what it was told. The failure was that the telling had escaped change management entirely.

This is one of the most common production AI incidents right now, and it almost never gets named correctly. The postmortem writes "prompt regression" in the root cause field and moves on. But "prompt regression" describes the code. The real root cause is an org chart with a hole in it.

The Organizational Immune System: Why Companies Kill AI Features That Actually Work

· 10 min read
Tian Pan
Software Engineer

Your AI feature works. It passes every benchmark you built. It handles edge cases your team spent weeks stress-testing. Users in the pilot loved it. Your model isn't hallucinating. Latency is under 300ms. The eval suite is green.

Then six months go by and it still isn't in production. Legal wants three more reviews. A senior VP is concerned about "scope." The team that owns the adjacent workflow says they weren't consulted. Finance says the ROI model needs rework. You're told to "socialize it more broadly."

This is the organizational immune system at work — and it kills more AI projects than bad models ever will.

Prompt Versioning and Change Management in Production AI Systems

· 9 min read
Tian Pan
Software Engineer

A team added three words to a customer service prompt to make it "more conversational." Within hours, structured-output error rates spiked and a revenue-generating pipeline stalled. Engineers spent most of a day debugging infrastructure and code before anyone thought to look at the prompt. There was no version history. There was no rollback. The three-word change had been made inline, in a config file, by a product manager who had no reason to think it was risky.

This is the canonical production prompt incident. Variations of it play out at companies of every size, and the root cause is almost always the same: prompts were treated as ephemeral configuration instead of software.