Skip to main content

Inference Billing as a P&L Line Item Nobody Owns

· 9 min read
Tian Pan
Software Engineer

Somewhere in your company, four people each believe a fifth person owns the inference bill. Engineering treats it as a cloud line item. The AI team treats it as the price of building. Finance treats it as a variable margin input that someone in engineering must already be managing. Product treats it as overhead that engineering absorbs. The bill keeps growing, and the only thing everyone agrees on is that it isn't theirs.

This is not a budgeting problem. It is an ownership vacuum, and it surfaces the first time the line item gets large enough for a CFO to ask about it on a board call. By then, the answers people improvise — "we'll optimize," "we'll cache more," "we'll switch models" — describe interventions without naming an owner. The conversation that should have happened a year earlier was not about how to lower the bill. It was about whose P&L the bill belonged to in the first place.

The shift is structural. Inference moved from 15% of enterprise AI spend in 2024 to roughly 85% in 2026, and the average enterprise AI budget grew from $1.2M to around $7M over the same window. A line item that was once rounding error is now the kind of number a board notices, and the org chart written before that shift has no row for it.

The Org Gap That Cloud Spend Already Solved Once

Cloud spend went through this exact problem a decade ago. EC2 bills arrived monthly, infrastructure teams paid them, and product owners shipped features whose unit economics nobody could compute. The discipline that emerged — FinOps — answered the structural question first and the optimization question second. Capacity planning belongs to SRE. Cost-per-customer belongs to product. Negotiating committed-spend discounts belongs to procurement. Each function owns a different cut of the same bill, and the joint forum exists explicitly so the cuts don't contradict.

Inference spend has neither half of that arrangement. There is no SRE-equivalent function that owns provisioned-throughput capacity against latency SLOs. There is no FinOps-equivalent function that owns committed-spend negotiations with model vendors. The team running evals owns model quality. The team writing prompts owns latency. Neither of them owns the conversation about whether the feature is generating enough value to justify the tokens it burns.

So the bill grows under "engineering tools" or "platform" or "shared services" until a single feature gets large enough to puncture the bucket. When it does, the response is a panic optimization sprint led by whichever engineer was unlucky enough to be in the standup that morning. That sprint produces a 30% cut, everyone exhales, and the ownership question is deferred again. The next time the bucket bursts, the optimization opportunities are smaller and the urgency is higher.

Cost Per Outcome Is the Metric the Org Chart Doesn't Compute

The right metric for inference is cost per validated business outcome — cost per resolved support ticket, cost per qualified lead, cost per accurate summary the user actually used. Engineering can tell you cost per token. Product can tell you outcomes per session. Finance can tell you revenue per customer. The intersection — cost per outcome — requires a join across three systems that no one function maintains.

The result is that the answer to "is this feature underwater?" lives in a spreadsheet someone builds quarterly, by hand, after the question has already become urgent. The join is tractable. The reason it doesn't exist is that the dashboard would have to be owned by someone, and the moment it has an owner, the underwater features have an accuser. Nobody volunteers for that role until it is forced on them, usually by a CFO who has stopped accepting "we're still calibrating" as an answer.

A working version of the metric needs four ingredients no single team has:

  • Token telemetry tagged at the feature level, not the model or the API key level
  • An attributable definition of "outcome" that the product team has signed off on
  • A revenue or value model that finance is willing to defend to leadership
  • A cost-allocation rule that pushes the inference spend back to the product owner's P&L rather than absorbing it into engineering overhead

The first ingredient is technical. The other three are political. That is why FinOps tools alone don't solve the problem — they ship the dashboard, but the dashboard answers a question the org has not yet agreed to ask.

Chargeback Is a Political Decision Disguised as a Technical One

The cleanest implementation of inference ownership is per-feature chargeback. The product team that owns the AI summarization feature pays for the tokens its feature consumes. The team that owns the chatbot pays for its tokens. Inference spend shows up on the product owner's P&L the same way headcount and ad spend do, and the kill-or-keep decision happens against a number the product owner cannot externalize.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates