Skip to main content

The Reserved Capacity Contract That Priced Out Your Overflow When the Provider Redefined the Bucket

· 10 min read
Tian Pan
Software Engineer

A platform team signed a multi-quarter reserved-throughput contract. Fixed per-token rate on committed capacity, a higher overage rate above the ceiling. Finance modeled the burn against six months of historical traffic that rarely crested the limit. The contract said "overflow" meant bytes-per-minute above the committed ceiling, and on that definition the deal was sound.

Six weeks later the bill was up 2.4× with no change to traffic shape, no change to routing config, no change to product surface. The provider had quietly revised the metering definition mid-quarter. "Overflow" now also counted any request the auto-router sent to a model tier above the one the reservation was anchored to — so a single Sonnet selection on a complex prompt landed in the overage bucket even when aggregate throughput sat comfortably inside the committed envelope. Thirty percent of traffic that used to invoice at the reserved rate now invoiced at the overage rate. Finance chased the spike through dashboards for three weeks before someone read the mid-quarter pricing addendum and found the redefinition in a footnote.

The contract had not been broken. The unit it was denominated in had been redenominated.

A Reserved-Capacity Deal Is a Derivative on a Metering Definition

When you sign a reserved-throughput contract — Azure Foundry PTUs, Bedrock Provisioned Throughput, an enterprise OpenAI commitment, a custom Anthropic deal — you are not buying a fixed dollar amount of compute. You are buying a price per unit, against a unit the provider defines. The unit looks stable because the words in the contract do not change. "Tokens per minute" reads the same in March as it does in September.

But the unit is operationally defined by the metering pipeline the provider operates. That pipeline decides which requests land in which bucket. Which model tier counts as the reserved tier. Whether a long-context request gets multiplied by a token-weighting factor. Whether a request routed through a "priority" path counts at 1.75× the standard rate, or whether spillover to a standard deployment falls back to per-token billing at the standard rate. Azure documents the priority tier explicitly at a 75% premium to standard, and the Flex tier at a 50% discount; the same word "token" maps to three different effective prices depending on which routing path the request takes.

The metering pipeline is the provider's. The definitions inside it are the provider's. The right to revise those definitions, in most enterprise terms, also belongs to the provider — limited only by a notification window (often thirty days) and the customer's right to terminate. A reserved-capacity contract whose metering definition is referenced rather than pinned is a derivative on a number the counterparty controls. That is not a contract. That is a position.

The Drift Happens Where Two Teams Stop Talking

The failure mode is not malicious repricing. It is silent unit drift between the team that models the bill and the team that runs the traffic.

Finance modeled the burn rate against historical token throughput because that is what historical invoices showed. Engineering ran the auto-router because the router selected cheaper models when it could and more capable ones when it had to, which was the right product decision. Neither team owned the mapping between "what the router decided" and "which invoice bucket the request landed in." That mapping lived in the provider's metering layer, where a routing-distribution shift — say, a prompt-engineering change that started triggering complex prompts more often, or a new feature that asks for a JSON-mode response that tips the router to Sonnet — instantly redistributes traffic across billing buckets without redistributing it across the dashboards either team watches.

Three weeks of investigation get spent looking for a code change that does not exist. The shape of the bill changed. The shape of the traffic did not. The only thing that moved was the function the provider applies between the two.

Pin the Metering Definition in the Contract, Not in a Referenced Policy

The first defense is contractual. Most enterprise AI contracts today reference the provider's pricing page or operations guide rather than embedding the metering definition in the contract body. That referenced page is a unilateral document the provider can revise on its own cadence. A thirty-day notification window does not protect a multi-quarter commitment — it just means the team finds out about the redenomination thirty days before it takes effect, when the only available remedies are paying it or terminating early and forfeiting the discount the commitment was supposed to deliver.

The protection is to name the metering definition as a substantive term of the contract. Specifically:

  • Define "overflow" as a numeric threshold against a specific unit, with the unit's definition embedded in the contract, not referenced.
  • Name which model tiers are included in the reserved bucket and which are not, by stable identifiers, not by referenced policy.
  • Specify that changes to the metering definition require a contract amendment, not a published addendum.
  • Include a price-protection clause that holds the metering definition constant for the duration of the committed term, even if the provider revises its public policy.

This is harder to negotiate than rate. Providers will resist because the metering definition is exactly the lever they want to retain. The deals that close on this point either come from large customers or from second-tier providers competing for the customer's business. The signal that a provider will not give on metering is the signal that the deal carries unmodeled tail risk.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates