Skip to main content

2 posts tagged with "api-contracts"

View all tags

The Logprobs Field Your Provider Removed That Broke Your Confidence Router Silently

· 12 min read
Tian Pan
Software Engineer

The most expensive line in the postmortem was the one nobody wrote: a 200 OK with a missing field. The router that was supposed to escalate hard questions to the stronger model had been escalating zero percent of traffic for six weeks. The cost dashboard was celebrating. The quality dashboard was sliding, but only on the hard-question slice the standing eval set underweighted. Everything looked like a win until a customer complained about a specific kind of question the system used to handle correctly.

The cause was a response shape change one tier up the contract stack. The provider's mid-tier plan had dropped per-token logprobs as part of what the release notes called a "tier-specific feature parity adjustment." The client still received valid JSON. The HTTP status was still 200. The model identifier in the response matched the model identifier in the request. The only thing that changed was that the field the router consumed to make its escalation decision was no longer there, and the defensive default added during an incident a year earlier had quietly become the production default for every request.

The Tool Version Bump Your Agent Quietly Adapted To

· 10 min read
Tian Pan
Software Engineer

A downstream search service ships v2.3.2 on a Tuesday afternoon. The release notes mention a renamed status field, a new nullable confidence value, and a reordered array in the result envelope. Nothing in the CHANGELOG is marked breaking. The provider's own client libraries absorb the change in a point release. Your team's HTTP integrations would have logged a deserialization error inside an hour. Your agent — the one routing customer questions through that search tool — does not. It keeps answering. The questions still resolve. The dashboards stay green.

Six weeks later, someone notices that "out of stock" replies have crept up from two percent of queries to eleven. The root cause is the v2.3.2 bump. The renamed status string changed from in_stock to available, and the agent — being a flexible reasoner over text rather than a schema-strict client — interpreted the absence of the old token as "not available," then phrased that finding into helpful, confident, wrong customer messages. The contract regression was absorbed on the consumer side, where no test suite was watching.

This is the failure mode that conventional API hygiene was never designed to catch. Strict clients break loudly. Agents break quietly. And the longer you treat your agent like a normal HTTP consumer, the longer this class of bug hides inside metrics that look fine.