The Frozen Prompt: When Your Team Is Afraid to Edit a System Prompt That Works
Every mature AI product eventually grows a system prompt that nobody on the current team fully understands. It started as forty tokens of plain English, and twenty months later it is a 4,000-token wall of conditional clauses, refusal templates, formatting rules, persona reinforcements, edge-case warnings, and one peculiar sentence about Tuesdays that nobody can explain. Each line was added in response to a specific failure: a customer complaint, a Slack ping from legal, a regression caught by an eval, a one-off bug that surfaced during an investor demo. The engineer who wrote line 37 has rotated to another team. The engineer who wrote line 112 was a contractor whose Notion doc was archived. The eval suite covers maybe a third of the behaviors the prompt is asserting, and nobody is sure which third.
So the prompt becomes load-bearing in the worst possible way: it works, the team knows it works, and the team has stopped touching it. Engineers who should be iterating on the prompt route their changes around it instead — adding a post-processing filter here, a few-shot wrapper there, a parallel "v2 prompt" feature-flagged off in case anyone ever finds the courage to A/B test the replacement. The prompt has stopped being software and has become a relic. And once that happens, the prompt is no longer the lever you use to improve the product. It's the constraint shaping it.
