The Prompt Version Your Team Treated as Independent of the Model Version
Your incident timeline says "no deploys in the last 72 hours." Your prompt registry agrees: prompt v37 has been frozen for three weeks. Your eval harness ran clean on Tuesday night. But on Wednesday morning, the structured-output failure rate on one of your agents tripled, the retry budget on another doubled, and a third started cheerfully ignoring an instruction it had been honoring for a month. Nothing changed. Except something did change, and it changed in the place neither side of the org was watching: the model.
The prompt registry knows about prompt versions. The model gateway knows about model versions. Almost nobody, in practice, tracks the pair. And prompt v37 isn't a free-standing artifact — it is, whether your tooling admits it or not, a contract negotiated against one specific model. When the platform team rolls the claude-sonnet-latest alias forward by a single point release, the contract on the other side has been quietly amended, and your incident timeline reads "no deploys" because the deploy happened on someone else's infrastructure under a name that didn't move.
The pairing nobody is writing down
A prompt has a version. A model has a version. Your registry stores the first. Your gateway resolves the second. What is missing from almost every system I have looked at is the joint version — the assertion that "we tested prompt v37 against model snapshot 2026-04-15 and that pair shipped on 2026-04-22." Without it, you have two independently versioned things composed at runtime, and the composition is the thing that produces output, not either side alone.
The reason this happens is organizational, not technical. The prompt lives in the application team's repo, gated by their PR review and their eval suite. The model selection lives in the platform team's gateway config, gated by a different review process and, often, a different eval suite (if any). Both sides version what they own. Neither side owns the cross-product. So when one side moves, the other side doesn't know it has been moved against.
The alias is the load-bearing detail. Most production systems do not call a pinned model snapshot like claude-sonnet-4-6-20250929. They call claude-sonnet-latest, or a gateway-side mapping that resolves to "whatever we picked this quarter," or an environment variable that the platform team owns. Each of these abstractions is doing what abstractions are supposed to do — hiding a name that changes — and the cost is that the change is invisible from the application side. The prompt team has no diff to look at on the day the model moves.
Why "no deploys" is wrong
When the platform team upgrades the alias from a 4.6 snapshot to a 4.7 snapshot, the application team sees a model that follows instructions more literally, weights newer training data more heavily, and may produce subtly different formatting on outputs that were never explicitly constrained. None of that shows up in a deploy log because, from the application team's perspective, no deploy happened. They didn't push code. They didn't bump a prompt version. Their CI did not run. And yet a foreign artifact has been substituted into the composition that produces their behavior.
This is not a hypothetical. The alias-upgrade incident is one of the most common production failure modes in LLM systems built since 2024, and it has a consistent shape: a stable endpoint name resolves to a new snapshot, the new snapshot follows the prompt's instructions more strictly than the old one did, instructions that were paper over old quirks now actively constrain new behavior, and the system regresses against an eval suite that was tuned to the previous quirk profile. The harness is not broken. The model is, by most measures, better. The pair, however, is untested.
The cleanest articulation of the failure mode I have seen is this: every prompt is an implicit contract with the model it was tuned against, and "switching the model" is a renegotiation of that contract whether or not anyone files it as such. Calling the move a model upgrade obscures what it actually is from the prompt's point of view, which is a counterparty change.
Prompts are not portable assets
There is a widespread assumption that prompts are portable — that with a thin abstraction layer over the provider SDK, you can swap models the way you swap a database driver. This is a category error. A database driver hides differences in transport; the schema underneath is the contract. A model abstraction layer hides differences in transport; the prompt is the contract. The contract does not become portable just because the wire format does.
- https://customgpt.ai/how-to-avoid-llm-vendor-lock-in/
- https://vivekhaldar.com/articles/portability-of-llm-prompts/
- https://arxiv.org/pdf/2601.12034
- https://www.digitalocean.com/community/tutorials/model-silent-versioning-problem
- https://dev.to/bailorgana/debugging-a-400-error-how-a-silent-api-gateway-update-broke-my-llm-agent-1ho2
- https://www.restate.dev/blog/dealing-with-versioning-in-long-running-agents
- https://deepchecks.com/llm-production-challenges-prompt-update-incidents/
- https://arxiv.org/pdf/2311.11123
- https://claudelog.com/faqs/what-is-model-alias/
- https://platform.claude.com/docs/en/about-claude/models/overview
- https://futureagi.com/blog/dynamic-prompts/
- https://www.braintrust.dev/articles/best-prompt-versioning-tools-2025
