The max_tokens Default Your Provider Raised That Doubled Your Tail Response Length
Your incident timeline shows no deploys. Your code did not change. Your traffic mix did not change. Your prompts did not change. And yet your p99 output length doubled inside a week, your downstream rendering layer started clipping responses, and your output-token bill rose 38% on traffic that wasn't asking for longer answers. The change was real, the regression was measurable, and nothing in your version control system records it — because the value that moved was one your code never sent.
The provider raised an implicit default. The release notes filed it under "improved long-form behavior." The parameter in question was max_tokens, which your application has been omitting since day one because the documented default was generous and your outputs rarely came close. The default moved from 4096 to 8192 to accommodate longer reasoning in the provider's newer models. Your application got the new default whether you wanted it or not, because the absence of a parameter is itself a configuration choice — and the provider owns the right to change the value behind it.
This is the failure mode where a "no-op" release on the provider's side propagates through your system as a behavior change, a cost change, and a UX change all at once, and your team's only diagnostic signal is the bill arriving at the end of the month.
