The Prompt as Documentation: When the System Prompt Becomes the Only Artifact Anyone Trusts
A product manager pings you in Slack asking what happens when a customer asks the assistant to cancel their subscription. You start typing the answer from memory, then second-guess yourself, then open the system prompt and read it for thirty seconds. You paste back a summary. They thank you and move on. Three hours later, support asks the same question. By Thursday, the head of partnerships pastes a screenshot of the prompt into a deal review.
This is the prompt-as-documentation anti-pattern, and the first time you notice it happening, it feels great. The artifact you spent six weeks tuning is now the canonical source of truth for what the product does. PMs are reading it. Support is reading it. Sales is reading it. Somewhere a designer is reading it. Your work is load-bearing in a way the old service-layer code never was, and you can prove it by counting the number of unrelated people who can pull the file from memory.
The good feeling lasts about a week. Then a customer asks for a refund using a phrase your few-shot examples never anticipated, the model improvises, the support engineer reads the prompt, tells the customer the bot will do X, the bot does Y, the customer escalates, and you spend Friday explaining to three people that "what the prompt says" and "what the model does" are not the same artifact. They are correlated. They are not equivalent. The distinction has been the source of every gnarly debugging session for the last quarter, and somehow nobody in the room is shocked.
This is worth thinking about carefully, because the cause is real. The prompt genuinely is the most current artifact you have. Killing the practice of reading it is the wrong fix. The right fix is to build the artifacts that should be more current than the prompt — and to teach the org what the prompt actually represents.
Why the prompt wins by default
The PRD is six months old. It was written before the eval set existed, before the migration to the smaller model, before the legal team asked you to add a refusal mode for tax advice. The PRD describes a feature; the prompt describes the feature plus seventeen patches plus four follow-up incidents plus the two paragraphs the founder added at midnight after seeing a competitor demo. By any reasonable metric of "what does this product actually do today," the prompt wins.
The wiki page is worse. The wiki page describes the product as it was pitched, not as it shipped. Half of it references model behaviors that disappeared two model versions ago. Nobody updates it because nobody reads it; nobody reads it because nobody updates it.
The release notes are a marketing artifact. They describe the surface change ("we improved tone in cancellation flows") without describing the mechanism ("we added three sentences to the system prompt and tightened one few-shot example, and the side effect was that the bot now refuses to engage with sarcasm"). A PM trying to predict edge-case behavior cannot reason from release notes. The prompt, in contrast, contains the actual instructions. A motivated reader can simulate the bot in their head, badly, and that is better than nothing.
So the prompt becomes the spec by attrition. Not because anyone decided it should be — because it is the only document that is forced to stay current. Every other artifact decays. The prompt cannot, because it is the runtime.
What the prompt is not
The prompt is the input to a probabilistic system. The model does not execute the prompt the way a compiler executes source code. It samples a distribution conditioned on the prompt. The same prompt produces different outputs depending on the user's phrasing, the conversation history, the time of day if any retrieved context is time-varying, the model version if your provider rolled an update without telling you, and the temperature you forgot you set to 0.7 in staging and 0.2 in prod.
This is not a small caveat. It is the entire reason the prompt-as-documentation anti-pattern is dangerous.
A non-engineer reading the prompt assumes the bot does what the prompt says. The prompt says "if the user asks for a refund, ask them which order, then check the policy, then offer either a full refund or a store credit." So when support asks "does the bot offer store credit?" the support engineer reads that line, says yes, and tells a customer who has been waiting two hours that they will be offered store credit. The customer is offered a full refund instead, because the model decided the customer's tone was too frustrated to risk the store-credit branch. The prompt did not lie. The prompt is just not what runs.
Conditional logic in a prompt reads like deterministic English. "If X, then Y." A reader trained on code reads this as a branch. The model treats it as a soft preference under heavy noise from the rest of the prompt and the conversation. Few-shot examples read like an enumeration of supported behaviors. They are not — they are demonstrations the model interpolates from. The reader infers a contract. The prompt is not a contract. It is closer to a strongly-worded preference letter that the model reads while making up its own mind.
The model also ignores parts of the prompt depending on length, position, recency, and what tokens happen to land in the attention pattern that turn. Production teams have shipped prompts where entire paragraphs were structurally dead — they were below the model's effective attention window, or they were contradicted by a more recent instruction, or they were swamped by a verbose retrieval block that consumed the context budget. The prompt looked like it did one thing. The model did another. Nobody noticed for weeks because the few cases where it mattered were rare enough to chalk up to user error.
- https://earezki.com/ai-news/2026-05-11-i-was-about-to-rewrite-my-chat-router-the-bug-was-two-lines-in-a-prompt/
- https://www.comet.com/site/blog/prompt-drift/
- https://agenta.ai/blog/prompt-drift
- https://tomtunguz.com/the-prompt-as-prd/
- https://addyosmani.com/blog/good-spec/
- https://www.thoughtworks.com/en-us/insights/blog/agile-engineering-practices/spec-driven-development-unpacking-2025-new-engineering-practices
- https://www.oreilly.com/radar/how-to-write-a-good-spec-for-ai-agents/
- https://eugeneyan.com/writing/llm-patterns/
