Skip to main content

3 posts tagged with "cost-attribution"

View all tags

Your Inference Chargeback Is Quietly Taxing Eval Discipline

· 12 min read
Tian Pan
Software Engineer

The FinOps team rolled out chargeback for AI a year ago. The dashboard is gorgeous. Every feature team can see, to the cent, what their inference bill was last month, and the platform PM has slides showing line-of-business attribution at the SKU level. The org has more AI features than it had a year ago. It also has worse AI quality. Nobody has connected the two facts yet, but they are the same fact.

Here is the failure mode in one sentence: chargeback prices the inference token and silently fails to price the eval token, so every PM on the org chart faces an incentive structure that rewards model upgrades and punishes evaluation discipline. Twelve months later, eval coverage is shrinking while the bill is growing — the precise opposite of what the FinOps initiative thought it was incentivizing. This is not a bug in the dashboard. It is the chargeback model functioning exactly as designed, in a domain where the design assumptions from cloud-cost FinOps no longer hold.

The Chargeback Ledger for Compound AI Systems

· 10 min read
Tian Pan
Software Engineer

The first time the CFO asks "what does the assistant cost us per month," the engineering team produces a number. The second time, a different team produces a different number. The third time, finance produces a third number, and somebody opens a spreadsheet that re-derives the bill from spans because nobody trusts any of the previous answers. This is the moment a compound AI system stops being an architecture problem and becomes an accounting problem.

The shape of the failure is structural. A single user request to "summarize my last quarter's customer feedback" triggers an agent owned by team A, which calls a retrieval tool maintained by team B, which calls a model hosted by provider X, which streams results back through a re-ranking tool from team C, which calls a different model from provider Y. One click; five owners; two invoices that arrive a month apart. Standard FinOps primitives — cost centers, allocation tags, account-level rollups — were designed to slice infrastructure that already had stable owners. They do not compose cleanly across an internal call graph that crosses team boundaries on every request.

The 2026 State of FinOps report puts 98% of FinOps teams on the hook for AI spend, and the same survey lists real-time visibility into AI costs as the top tooling gap. That gap is not "we cannot see the bill." The gap is "we cannot see who caused what slice of the bill, fast enough that anyone changes their behavior before the bill arrives."

Cost Per Feature, Not Cost Per Token: The Allocation Gap in AI Budgets

· 10 min read
Tian Pan
Software Engineer

Your finance team can tell you, to the dollar, what you spent on Anthropic and OpenAI last month. Your product team can tell you which features users touched the most. Nobody in the building can tell you whether Draft-Email is profitable, whether Summarize-Thread should stay in the free tier, or whether the new Rewrite-Tone feature is eating Draft-Email's lunch on a per-user basis. You have two dashboards that claim to track the same dollars and neither answers the question that actually drives product decisions.

This is the allocation gap. You measure token spend per endpoint because that is what the provider API gives you. But /chat serves twelve features that happen to share a prompt template, and "per endpoint" collapses all twelve into one line item. Pricing tiers, feature gating, deprecation calls, and the "do we ship this?" conversation all float on gut feel until someone does the plumbing to route token costs back to the features that incurred them.

The plumbing is not glamorous. It is request-level tagging, trace-to-telemetry joins, and a disciplined refusal to ship an AI feature without its own cost label. Teams that treat this as infrastructure investment end up with per-feature margin reports segmented by user cohort. Teams that defer it to next quarter end up making pricing decisions from vibes for eighteen months and discovering, after the fact, that a single customer segment was responsible for half the inference bill at negative margins.