Cost Per Feature, Not Cost Per Token: The Allocation Gap in AI Budgets
Your finance team can tell you, to the dollar, what you spent on Anthropic and OpenAI last month. Your product team can tell you which features users touched the most. Nobody in the building can tell you whether Draft-Email is profitable, whether Summarize-Thread should stay in the free tier, or whether the new Rewrite-Tone feature is eating Draft-Email's lunch on a per-user basis. You have two dashboards that claim to track the same dollars and neither answers the question that actually drives product decisions.
This is the allocation gap. You measure token spend per endpoint because that is what the provider API gives you. But /chat serves twelve features that happen to share a prompt template, and "per endpoint" collapses all twelve into one line item. Pricing tiers, feature gating, deprecation calls, and the "do we ship this?" conversation all float on gut feel until someone does the plumbing to route token costs back to the features that incurred them.
The plumbing is not glamorous. It is request-level tagging, trace-to-telemetry joins, and a disciplined refusal to ship an AI feature without its own cost label. Teams that treat this as infrastructure investment end up with per-feature margin reports segmented by user cohort. Teams that defer it to next quarter end up making pricing decisions from vibes for eighteen months and discovering, after the fact, that a single customer segment was responsible for half the inference bill at negative margins.
