The Streaming Abort Your Provider Billed Anyway: A 14% Gap Hiding in Your Invoice
Your finance team filed a dispute and lost. The line item is "output tokens" and it exceeds your sum-of-delivered-tokens metric by fourteen percent. The provider's support engineer closed the ticket as "expected behavior under streaming cancellation," with a link to a documentation page that says "cancellation stops billing at the last delivered token." Both sentences are true, and the gap between them is the line of code you have not written.
The contract you read says one thing. The inference scheduler does another. The mismatch is not a bug, not a billing error, and not malice — it is a layered system in which the cancellation signal travels through three boundaries (browser, edge, GPU) and the billing meter sits at the third boundary while your "stop generating" button sits at the first. Closing the gap is an engineering project with a finance owner.
