Token Budgets Are the New Internal IAM
The first time your AI bill clears seven figures in a month, the budget meeting changes shape. Until then, the question is "can we afford this." After that, the question is "who gets how much" — and most engineering orgs discover, in real time, that they have no policy framework for answering it. The team that shipped the loudest demo holds the highest quota by accident. Finance pushes for flat per-headcount caps that starve the team doing the highest-leverage work. Security gets cut out of the conversation entirely until somebody notices that the eval team has been pulling production traffic through their personal token allowance for six months.
The reason this conversation always feels like a cloud-cost argument is that it almost is one — but not quite. With cloud, the unit of waste is a forgotten EC2 instance and the worst case is a 3x bill. With token quotas, the unit of waste is a runaway agent loop, and the unit of access is a user-facing capability: whoever holds the budget can ship the feature. That second property is what makes token allocation rhyme with capability-based security instead of with cloud FinOps. The quota is not just a spending cap. It is the right to make a class of inferences happen.
