The Logprobs Field Your Provider Removed That Broke Your Confidence Router Silently
The most expensive line in the postmortem was the one nobody wrote: a 200 OK with a missing field. The router that was supposed to escalate hard questions to the stronger model had been escalating zero percent of traffic for six weeks. The cost dashboard was celebrating. The quality dashboard was sliding, but only on the hard-question slice the standing eval set underweighted. Everything looked like a win until a customer complained about a specific kind of question the system used to handle correctly.
The cause was a response shape change one tier up the contract stack. The provider's mid-tier plan had dropped per-token logprobs as part of what the release notes called a "tier-specific feature parity adjustment." The client still received valid JSON. The HTTP status was still 200. The model identifier in the response matched the model identifier in the request. The only thing that changed was that the field the router consumed to make its escalation decision was no longer there, and the defensive default added during an incident a year earlier had quietly become the production default for every request.
