The Tokenizer Upgrade That Invalidated Every Prompt Cache Prefix
The release notes were two lines long. "Improved multilingual tokenization. No breaking changes to model outputs." Nine words. Your evals confirmed it: same prompts, same completions, same scores. Your platform team signed off on the upgrade Friday afternoon. By Tuesday morning your cache hit rate had collapsed from 80% to 4%, your daily inference bill had quadrupled, and the on-call engineer who paged you at 6am could not find a single line of your code that had changed.
Nothing in your code had changed. The provider had shipped a new tokenizer that split one Unicode glyph one byte differently than the old one. Every cached prefix in your system was now fingerprinted against a token sequence that no longer existed. The model behaved identically — that was true. The cache layer, which the release notes did not mention, paid the bill in full.
