Who Owns the Idle Cost of an AI Feature
The pay-per-token mental model has trained a generation of engineers to think AI cost is a function of usage. No requests, no bill. It is a comforting model, and for the API call itself, it is roughly true. But it describes only one layer of a production AI feature, and not the layer that quietly drains the budget.
Provisioned throughput, reserved GPU capacity, warm vector indexes, and standby fine-tuned endpoints all bill on a clock, not a counter. They charge for the right to serve traffic, whether or not traffic arrives. The feature nobody touches on a Saturday still has a meter running. The internal tool used by twelve people during business hours bills for all 168 hours of the week. The launch you provisioned for in March still holds its reservation in May, long after the spike flattened.
This is idle cost, and the reason it grows unchecked is not technical. It is organizational: no single role can see it, and no single role owns it.
