Your AI Explainer Doc Is a Runtime Dependency, Not Marketing Copy
A team I worked with last quarter shipped an AI assistant with a tidy stack of supporting documents: an in-product tooltip warning that the AI may produce inaccurate results, a help-center article titled "How does the assistant work," an internal support runbook for handling escalations, and a public model card listing the underlying model, the tools the assistant could call, and the data domains it covered. The launch went well. Six months later the prompt had been edited fourteen times, the model had been swapped from one tier to another with subtly different refusal behavior, two new tools had been added, one tool had been deprecated but not removed from the prompt, and the language settings had been opened from English-only to nine locales.
Every single one of those documents was wrong. Not catastrophically wrong — the kind of wrong where a sentence is half-true, a capability is described in language the model no longer matches, a refusal pattern is documented that the new model never triggers, a tool name appears in the help article that the assistant won't actually call. The kind of wrong that produces a slow drip of confused support tickets, a few customer trust regressions when the AI does something the docs say it won't, and — because the company sells into a regulated vertical — a small but real compliance gap that nobody on the AI team had thought to track.
The team's reaction told the story. The AI engineers were surprised: they'd been changing the prompt and the tools because that was their job; the help-center copy was somebody else's repo. The support team was annoyed: they'd written runbooks based on the launch spec and nobody had told them the spec had changed. The legal team was concerned: the model card filed with the compliance review board no longer described the live system. The PM owned the feature but didn't own the docs about the feature, so when the question of "who's responsible for this" got asked, four different people pointed at four different other people.
The architectural mistake under all of this is treating the explainer doc as marketing copy when it is in fact a runtime dependency of the feature's trust contract. The docs are not separate from the system — the customer reads them, forms expectations, and the system must meet those expectations or the trust breaks. When the prompt drifts faster than the docs, the trust contract is broken the moment the drift exceeds the doc's tolerance, and nobody on the team is watching for it because the docs have been mentally filed under "comms" rather than "production surface."
The Drift Catalog: What Actually Goes Stale
The conventional drift conversation in AI engineering is about the model: prompts drift, retrieval indexes drift, eval sets go stale. The doc-drift problem has a different shape and a longer list. Here are the surfaces I've seen go out of sync most often, and the half-life of each before it produces a measurable customer-facing artifact.
The in-product disclaimer ("AI may produce inaccurate results") looks like the lowest-stakes surface, but it's the one most often quoted back at the team during a complaint. When you change the underlying model and the new model is more confident in its hallucinations, the boilerplate disclaimer no longer matches the failure mode the user just experienced — and "you said it might be inaccurate" stops being an adequate response.
The help-center article ("How does the assistant work") describes capabilities in marketing prose. It says things like "the assistant can help you draft emails, summarize documents, and find information across your workspace." That sentence was true when written. Three prompt edits later, the assistant has been told to refuse certain summarization requests on data-loss-prevention grounds, the cross-workspace search has been narrowed to the user's own workspace because of an audit finding, and the email drafting has been augmented with a tone-policy that wasn't in the original article. Every one of those edits was correct. The article is now wrong in three places.
The support runbook ("If a customer asks why the AI said X") is the highest-leverage drift surface and the one teams least often track. Support replies pattern off this runbook hundreds of times a week. When the runbook says "the assistant cannot access financial data" and the prompt has since been expanded to include a financial-summary tool, every support reply quoting the runbook is contradicting the live system.
The public model card or transparency note is the surface with the most legal weight per word. It names the model, the data domains, the human-oversight protocol, the refusal categories, and the auditable claims about training and deployment. With EU AI Act transparency obligations taking effect August 2026, a model card that describes a system the company no longer runs becomes a document a regulator can hold against you in writing — not because the system is doing anything wrong, but because the system the regulator approved is not the system in production.
The CODEOWNERS implication is the meta-surface. Most teams have CODEOWNERS for the prompt repo (AI team), CODEOWNERS for the help center (content team), CODEOWNERS for the model card (legal/compliance), and CODEOWNERS for the support runbook (support ops). A change that should fan out across all four touches one of them. The other three drift in silence.
Why the Drift Goes Unnoticed Longer Than You'd Think
There's a structural reason this drift compounds. In a code-only system, when a function signature changes, the call sites break and the build catches it. In an AI system, when the prompt changes and the help-center article doesn't, nothing breaks. The build still passes. The eval still runs. The customer still uses the feature. The drift is invisible to every automated check the team has built — because the team has built checks for prompt-vs-eval drift, not prompt-vs-doc drift.
The first signal usually arrives as a one-off support ticket: "the article said the assistant could do X but it refused." The ticket gets closed with a polite explanation that the capability has changed. Nobody updates the article, because nobody on the support agent's team owns the article. The article continues to claim X. Two weeks later another ticket on the same topic. After a quarter, the support team has a small but persistent volume of tickets driven by the doc-drift, and they've quietly absorbed it as a cost of doing business.
The second signal — and the one that usually triggers the org to take the problem seriously — is louder. A journalist quotes the model card or transparency note in a piece about the company. The PR team realizes the quoted claim no longer matches the live system. Now the doc-drift has a press cycle attached. Or, in regulated contexts, a compliance audit asks the team to demonstrate that the system in production matches the system in the model card. The auditor finds the gap before the team does, and the conversation gets very expensive very fast.
The lag between drift starting and drift being noticed is, in my experience, between three and nine months. The prompt repo can predict it: if you can see the diff history of the prompt, the tool manifest, and the model selection, you can compute the drift surface and you'll know — to within a paragraph — what your help-center article and model card no longer correctly describe.
Treating the Explainer as Code
The fix is not to write better docs. The fix is to treat the explainer surface as a runtime artifact owned by the same source-of-truth as the prompt and tools, and to derive the human-readable copy from it.
In practical terms, this looks like a prompt manifest that lists the model, the system prompt sections, the tool names with their human-readable descriptions, the refusal categories with their human-readable explanations, the data domains the assistant operates over, and the human-oversight escalation paths. The manifest is the source of truth. The help-center article, the support runbook, and the model card are templates that interpolate from the manifest. When the manifest changes, the templates regenerate, and a CI step fails the change if the regenerated docs aren't reviewed in the same PR.
This is the same pattern that generated docs from OpenAPI specs solved for HTTP APIs in the previous decade: stop hand-writing the description of an interface that's defined elsewhere. The novelty for AI features is that the interface is a prompt-and-tool composition rather than a method signature, and the customer-facing description includes things like refusal behavior and confidence calibration that don't have an analogue in REST.
A practical sketch of what to put under version control alongside the prompt: the model identifier and tier, the named system-prompt sections with semantic identifiers (so "the section that defines tone" is addressable separately from "the section that defines safety policy"), the tool registry with names and one-paragraph human-readable descriptions of what each tool does and the data it accesses, the refusal-category registry with the user-facing rationale, the data-domain registry, the human-oversight protocol (who can override what, on what timeline), and the locale registry. This is the manifest. It probably already exists implicitly in the team's prompt repo — it just isn't pulled out as a structured artifact.
The help-center article is a template that consumes the tool registry and refusal-category registry. The model card is a template that consumes the model identifier, the data-domain registry, and the human-oversight protocol. The support runbook is a template that consumes the refusal-category registry and a curated set of tool-failure narratives. When the prompt manifest changes, the CI pipeline regenerates each template, runs a diff, and blocks the merge until a content owner approves the new copy.
The Org Move That Has to Land
The technical fix doesn't take long to build. The harder change is organizational. The AI team owns the prompt manifest. The content team owns the help-center voice. The compliance team owns the model-card claims. The support team owns the runbook style. None of them currently own the consistency between these documents and the live system.
The org pattern that works: name a single owner for the explainer surface — usually the AI feature's PM — and put that surface in the AI team's CODEOWNERS section so that every PR touching the prompt manifest requires review from a content reviewer. Add a CI job that flags when the prompt or tool registry changes without a corresponding doc update. Schedule a quarterly audit where the live system's manifest is diffed against the model card and any drift is treated as a compliance ticket, not a content ticket. Set a "doc-of-record" policy for the support team's AI-question handling so that the agent answering a customer question about the assistant is reading the current spec, not a copy of the launch-day spec from the team wiki.
The shape of this is identical to the shape of every "treat X as code" movement that came before it — infrastructure as code, policy as code, security as code. Each one started as a content team writing prose about a system the engineering team controlled, drifted predictably, and was eventually fixed by giving the prose a programmatic lineage to the system itself. The explainer-as-code pattern is the same play, applied to a surface most AI teams haven't yet recognized as a runtime dependency.
What to Do This Week
If you have a shipped AI feature, three things are worth doing before the next prompt change ships.
First, take an inventory: list every customer-facing or compliance-facing document that describes the feature's capabilities, refusal behavior, model, tools, or data scope. The list is usually longer than the team expects. Tooltips, help articles, runbooks, model cards, transparency notes, sales-enablement decks, security questionnaires answered for enterprise prospects, the answer the website's own AI chatbot gives when a visitor asks about the assistant — all of these are part of the trust contract.
Second, diff each document against the current prompt manifest. For each document, mark the claims that are still correct, the claims that are out of date, and the claims that are technically correct but newly misleading because of a context change. The third category is usually the largest and the most damaging.
Third, decide what becomes the source of truth. The realistic answer is the prompt repo plus a small structured manifest beside it. Everything else gets templated. The teams that are six months ahead on this are already doing it; the teams that aren't are paying for the gap in support volume, in audit findings, and in the slow erosion of customer trust that builds quietly until a single bad incident makes it visible all at once.
The explainer doc is not marketing copy. It is the customer's mental model of your AI feature, made durable in writing. When the feature changes and the doc doesn't, the customer's mental model is now wrong, and the next time the feature behaves differently from the model the customer holds, you've spent some of the trust you were trying to bank with the doc in the first place. The teams that win the trust race in 2026 will be the teams whose docs decay on a schedule no faster than the prompt repo can predict — which means the docs and the prompt repo have to be the same thing, with different rendering.
- https://artificialintelligenceact.eu/article/50/
- https://digital-strategy.ec.europa.eu/en/policies/code-practice-ai-generated-content
- https://www.legalnodes.com/article/eu-ai-act-2026-updates-compliance-requirements-and-business-risks
- https://underdefense.com/blog/ai-data-governance/
- https://2b-advice.com/en/2025/09/16/model-cards-thats-why-model-cards-are-so-important-for-ki-documentation/
- https://orq.ai/blog/model-vs-data-drift
- https://agenta.ai/blog/the-definitive-guide-to-prompt-management-systems
- https://ibm.github.io/prompt-declaration-language/
- https://arxiv.org/html/2512.12443
