The Fallback Model Whose System Prompt Was Tuned for Someone Else
Your reliability dashboard says 99.95%. Your support inbox says something else. Twice a week, for ten or twenty minutes at a time, a thin sliver of users gets a version of your product that talks like a different company. The refusals read funny. A structured field that always rendered as a tidy two-column card now shows up as a paragraph with bullet points smashed inside it. Tone shifts from "calm expert" to "eager assistant." Nobody opens a ticket — they just close the tab and try again later.
Your provider went down. The failover worked. Latency stayed under SLO. The error budget did not move. And the experience your users got during that window was not the one you ship.
The mental model most teams carry into multi-provider architecture is that the system prompt is portable — a contract negotiated with the abstract idea of "a capable model," readable by anyone who speaks the LLM dialect. That model is wrong. A system prompt is a tuned artifact. It is tuned against a specific model's preferences, refusal grammar, formatting habits, and instruction-following biases. When the failover engages, you are not handing the same contract to a comparable counterparty. You are handing a contract written in your primary's idiom to a model that reads a different idiom and signs it anyway.
