Skip to main content

One post tagged with "tokenizers"

View all tags

The Tokenizer Upgrade That Invalidated Every Prompt Cache Prefix

· 9 min read
Tian Pan
Software Engineer

The release notes were two lines long. "Improved multilingual tokenization. No breaking changes to model outputs." Nine words. Your evals confirmed it: same prompts, same completions, same scores. Your platform team signed off on the upgrade Friday afternoon. By Tuesday morning your cache hit rate had collapsed from 80% to 4%, your daily inference bill had quadrupled, and the on-call engineer who paged you at 6am could not find a single line of your code that had changed.

Nothing in your code had changed. The provider had shipped a new tokenizer that split one Unicode glyph one byte differently than the old one. Every cached prefix in your system was now fingerprinted against a token sequence that no longer existed. The model behaved identically — that was true. The cache layer, which the release notes did not mention, paid the bill in full.