The Demo You Recorded in March Was the Last Time It Worked
A sales engineer at a Series B AI company recorded a five-minute walkthrough on a Tuesday in March. The agent picked the right tool on the first try, framed the answer in the buyer's vocabulary, and refused a gnarly edge case with a politeness that landed as "thoughtful, not hedging." That recording went into the asset library. Over the next seven weeks it closed five deals.
By the time the sixth prospect watched it on an onboarding call in late May, the model had received a provider point-release that re-tuned its refusal phrasing, the prompt had been edited twice to fix an unrelated regression, the tool catalog had grown by three entries (one of which the model now preferred), and the RAG corpus had been re-indexed against a new chunker. The demo was no longer a recording of the product. It was a recording of a product that no longer existed.
The buyer noticed. The customer-success rep noticed. The AI team did not — because the contract between the asset library and the deployment was implicit, and nobody on the engineering side had been told they were the one holding it.
The four pins that move underneath a demo recording
A demo video is a behavioral promise frozen at one instant. Behind the cursor are four moving parts, any one of which can falsify the promise without anyone editing the file.
The model snapshot. Anthropic's documentation distinguishes a snapshot identifier like claude-opus-4-5-20251101 (reproducible, version-pinned, with a published retirement date) from rolling aliases that always resolve to whatever is latest in the family. OpenAI publishes dated snapshots; Google uses generation markers. The rolling alias is convenient for development and catastrophic for any artifact that is supposed to mean the same thing six weeks later. The lifecycle treadmill has only sped up: snapshots that used to live 18–24 months now live 6–12, and providers regularly retire IDs with two months of notice. A demo recorded against a rolling alias is wallpapered onto whichever weights happen to be behind that string on the day a prospect watches the video.
The prompt. This is the component nobody pins because it lives in a config file or a feature flag, not a versioned artifact. In real LLM applications the model weights are the least frequently changed thing; the prompt changes weekly, sometimes daily, often by people who don't know a sales demo depends on the previous wording. A one-line edit to fix a refusal regression rewrites the social register of every recorded interaction.
The tool catalog. The agent in the demo chose tool A out of five candidates. By the time the prospect watches it, there are eight candidates, the descriptions of three have been clarified, and the model's tool-selection has drifted accordingly. The demo isn't wrong; the menu has changed.
The retrieval substrate. If your demo shows a question being answered from your documentation, that answer is a function of which chunks were retrieved, which is a function of the chunker, the embedding model, and the corpus snapshot. Any of those changing — especially the embedding model — can rearrange which passages the model sees, which rearranges what it says. The recorded answer becomes a behavioral lottery whose ticket got reshuffled.
Each of these four pins drifts on its own clock. The probability that none of them have moved seven weeks after a recording is, in practice, zero.
The ownership gap that makes this a recurring incident, not a one-time mistake
The reason this keeps happening is not that any individual team is careless. It is that the demo recording sits at a seam between two functions that do not see each other's change calendars.
The sales team owns the asset library. They produce recordings on the cadence of the deal cycle. They version their assets by "what closed deals" — a recording stays in rotation as long as its conversion rate holds. The AI team owns the deployment. They version by what the model is doing today. The contract between the two is implicit: sales assumes the engineering substrate is stable enough that a video from March still describes a product in June; engineering assumes the videos are conversational, not promissory.
Neither assumption survives contact with the model lifecycle. The demo says "this is what the product does." Six weeks of unannounced drift later, the deployment says "no it doesn't." The buyer reconciles the two by deciding the product is buggy.
A useful diagnostic: ask both teams which calendar a demo recording belongs on. If sales says "the deal calendar" and engineering says "what calendar?" — that is the unsigned contract you are looking for.
A demo recording manifest
- https://www.getmaxim.ai/articles/a-comprehensive-guide-to-preventing-ai-agent-drift-over-time/
- https://ascentcore.com/2026/05/04/why-your-ai-agents-are-one-update-away-from-breaking/
- https://stackpulsar.com/blog/llm-model-drift-detection/
- https://www.traceloop.com/blog/catching-silent-llm-degradation-how-an-llm-reliability-platform-addresses-model-and-data-drift
- https://arxiv.org/pdf/2511.07585
- https://venturebeat.com/infrastructure/monitoring-llm-behavior-drift-retries-and-refusal-patterns
- https://orq.ai/blog/model-vs-data-drift
- https://machinelearningmastery.com/the-roadmap-for-mastering-llmops-in-2026/
- https://presenc.ai/research/ai-model-deprecation-tracker-2026
- https://developers.openai.com/api/docs/deprecations
- https://github.blog/changelog/2026-02-19-selected-anthropic-and-openai-models-are-now-deprecated/
- https://www.braintrust.dev/articles/llm-evaluation-guide
- https://www.langchain.com/articles/llm-evals
- https://developers.openai.com/api/docs/guides/evals
- https://www.promptlayer.com/
- https://www.confident-ai.com/knowledge-base/compare/best-ai-evaluation-tools-for-prompt-experimentation-2026
- https://www.walnut.io/blog/ai/ai-buyer-paradox-demo-agent-first-motion/
- https://supademo.com/blog/demo-automation
- https://medium.com/@nraman.n6/versioning-rollback-lifecycle-management-of-ai-agents-treating-intelligence-as-deployable-deac757e4dea
- https://www.businessplusai.com/blog/ai-agent-versioning-and-updates-managing-change-over-time
