2 posts tagged with "failover"

The Fallback Model Whose System Prompt Was Tuned for Someone Else

June 3, 2026 · 10 min read

Software Engineer

Your reliability dashboard says 99.95%. Your support inbox says something else. Twice a week, for ten or twenty minutes at a time, a thin sliver of users gets a version of your product that talks like a different company. The refusals read funny. A structured field that always rendered as a tidy two-column card now shows up as a paragraph with bullet points smashed inside it. Tone shifts from "calm expert" to "eager assistant." Nobody opens a ticket — they just close the tab and try again later.

Your provider went down. The failover worked. Latency stayed under SLO. The error budget did not move. And the experience your users got during that window was not the one you ship.

The mental model most teams carry into multi-provider architecture is that the system prompt is portable — a contract negotiated with the abstract idea of "a capable model," readable by anyone who speaks the LLM dialect. That model is wrong. A system prompt is a tuned artifact. It is tuned against a specific model's preferences, refusal grammar, formatting habits, and instruction-following biases. When the failover engages, you are not handing the same contract to a comparable counterparty. You are handing a contract written in your primary's idiom to a model that reads a different idiom and signs it anyway.

The Fallback Model You Never Load-Tested

May 17, 2026 · 8 min read

Tian Pan

Software Engineer

Every resilient LLM design has a line in the config that names a secondary model. It is there because someone, during a design review, asked the right question — "what happens when the primary is down?" — and someone else answered it with a fallback: key. Everyone nodded. The architecture diagram got a second box with a dotted arrow. The compliance doc got a sentence about graceful degradation.

And then nobody touched it again.

The fallback model is the most confidently asserted, least exercised component in most production AI systems. It is named, documented, and diagrammed — and on the day it actually carries traffic, it is also the day it has its first encounter with a real request. You did not build a safety net. You built a second model with an unknown breaking strain, and you will discover that strain at the worst possible moment.

About Tian Pan