The Token Count Your Client Estimated And Your Provider Invoiced
Your application counted tokens locally with a tokenizer library matching what you believed the provider used. The SDK reported "estimated 4,200 tokens" before each call. Your budget logic admitted the request. Then the provider's invoice came back at 6,800 tokens for the same payload. Multiply that 60% gap by a few million calls a month and the line item your finance team cannot reconcile against your own logs starts to look like an architectural mistake rather than a rounding error.
The mistake is not that the local tokenizer was wrong. The mistake is treating the local tokenizer as a contract instead of a guess. Tokenization is something the provider does inside their serving stack — your library is a model of that process, not the process itself, and the two drift in ways that are small per call and structural across the population of calls you actually make.
