The Provider Failover That Swapped Your Safety Policy Mid-Conversation
A user is twelve turns into a careful conversation with your assistant about prescribing patterns for a controlled substance. The model has been measured, asking clarifying questions, citing guidance, declining to extrapolate beyond the literature. On turn thirteen, the user asks a follow-up that should land the same way the prior twelve did. Instead, they get a flat refusal: "I can't help with that." The conversation is over. They write to support furious — they were not asking anything different, the assistant was just helping them, what changed.
Your logs explain what changed. Halfway through turn thirteen, your primary provider returned a 503 in the middle of the stream. Your gateway, doing exactly what it was configured to do, failed over to the secondary provider for the remainder of the request. The secondary provider's refusal threshold for that class of query is calibrated more conservatively than the primary's. The user did not ask anything different — they asked the same question to a different model under the same brand, and the new model said no.
