Skip to main content

The Metered AI Pricing Death Spiral: Why Per-Token Billing Punishes Your Best Features

· 8 min read
Tian Pan
Software Engineer

Token costs dropped 280x in two years. Enterprise AI bills went up 320%. If that sounds like a paradox, you haven't looked closely at how per-token billing interacts with the features that actually make AI products valuable.

The most useful AI workflows — deep research, multi-step reasoning, iterative refinement, agentic tool use — are precisely the ones that consume the most tokens. Under pure usage-based pricing, your best features are your worst margin killers. This isn't a temporary scaling problem. It's a structural misalignment between how AI creates value and how it gets billed.

The Inverse Value Problem

Traditional SaaS has near-zero marginal cost per user. A power user on Notion or Figma costs roughly the same to serve as a casual one. AI products break this assumption completely.

Traditional SaaS runs at 70–80% gross margins. AI products sit at 20–60%, and the gap widens with every feature that makes the product more useful.

Consider what happens when you build an AI coding agent. A simple autocomplete suggestion costs a fraction of a cent. But the features that differentiate your product — multi-file refactoring, deep codebase analysis, iterative debugging loops — consume 10x to 50x more tokens per interaction. An agentic workflow that resends full conversation history with every API call sees costs compound as context grows: a 200K-token conversation costs 10x a 20K-token one. A single debugging session can balloon from 0.50to0.50 to 30 through 47 retry iterations.

This creates a perverse dynamic: the more valuable the interaction, the more it costs you to provide it. Your power users — the ones most likely to convert, retain, and evangelize — are simultaneously your most expensive to serve.

Anthropic acknowledged losing "tens of thousands of dollars per month" on heavy Claude Code users under its original 200/monthplan.GitHubCopilotlostmoneyperuseratlaunch.OpenAIburned200/month plan. GitHub Copilot lost money per user at launch. OpenAI burned 8 billion on compute in 2025 and projects $14 billion in cumulative losses by end of 2026. These aren't early-stage inefficiencies. They're the predictable consequence of pricing models that don't account for the variance in what users actually do.

The 50x User Problem

Usage-based pricing assumes a relatively narrow distribution of consumption. In practice, AI products see extreme power-law distributions. Your top 5% of users may consume 50x the tokens of your median user. Under flat-rate pricing, these users destroy your unit economics. Under pure usage-based pricing, they face bill shock and churn.

Cursor learned this the hard way. In June 2025, it replaced fixed "fast request" allotments with usage-based credit pools. Some developers exhausted their monthly allocation in a single day. The backlash was severe enough that the CEO issued a public apology and the company processed refunds. The core issue wasn't the price level — it was the unpredictability. Developers who had been happily paying 20/monthsuddenlyfaced20/month suddenly faced 350 in weekly overages.

The lesson isn't that usage-based pricing is wrong. It's that raw token consumption is the wrong unit of measurement. A token doesn't capture whether the user got value. A developer who burns 100K tokens on a failed debugging loop and a developer who burns 100K tokens shipping a feature are charged identically, but their experiences — and their likelihood of renewing — couldn't be more different.

Why Traditional Cost Controls Don't Work

The obvious response to cost pressure is to add guardrails: rate limits, context windows, model downgrades for routine tasks. These help at the infrastructure level, but they create a product problem. Every guardrail that reduces token consumption also reduces the product's ability to deliver its core value proposition.

Rate limits force users to ration their interactions. Context window caps break multi-step workflows mid-execution. Aggressive model routing — sending simple queries to cheaper models — works until a "simple" query turns out to need deeper reasoning and the user gets a bad answer. The optimization that saves you money is the same optimization that makes your product worse.

This is the death spiral: costs force constraints, constraints degrade the experience, degraded experience reduces willingness to pay, which forces more cost constraints. The companies stuck in this loop are the ones trying to solve a pricing problem with infrastructure.

Enterprise environments compound the issue. With 84% of companies reporting more than 6% gross margin erosion from AI costs and only 15% able to forecast AI spending within ±10% accuracy, the problem isn't just per-user economics — it's organizational visibility.

Disconnected teams spin up redundant AI capabilities. Premium models get used for routine classification tasks. Abandoned experiments keep consuming resources. The "fragmentation tax" accumulates silently until margins are structurally damaged.

The Pricing Models That Actually Work

The industry is converging on hybrid architectures that separate base access from compute-intensive usage. Nearly half of the top 50 AI startups by valuation now employ two to three pricing models simultaneously. The patterns that work share a common principle: price by the value delivered, not the compute consumed.

Outcome-based pricing ties cost to successful results. Intercom's Fin chatbot charges $0.99 per resolution — defined as the customer confirming satisfaction or leaving without escalation. The vendor only gets paid when the AI actually solves the problem. This flips the incentive structure: the provider is motivated to resolve issues in fewer tokens, not more. The downside is measurement complexity. You need robust infrastructure to define, track, and validate "outcomes," and revenue becomes volatile if model performance fluctuates.

Task-tier pricing assigns fixed costs to categories of work rather than metering raw consumption. Instead of charging per token for a coding agent, you charge per task — "fix this crash," "refactor this module," "write these tests" — regardless of how many tokens the agent burns internally. This gives users cost predictability while letting the provider optimize compute behind the scenes.

Base-plus-burst models combine a flat subscription covering normal usage with usage-based billing that kicks in only at high-consumption thresholds. This works because it handles the 50x user problem explicitly: casual and moderate users get predictable pricing, while power users pay proportionally for disproportionate consumption. The key is setting the burst threshold high enough that most users never hit it.

Credit pools with model-aware weighting allocate a monthly budget where different models and task types consume credits at different rates. A Claude Haiku query might cost 1 credit while a Claude Opus query costs 15. This preserves user choice while making the cost tradeoffs transparent. The failure mode — Cursor's June 2025 debacle — happens when credit depletion rates surprise users. Transparency about consumption rates isn't optional.

Building the Telemetry to Price by Outcome

Switching from per-token to value-based pricing requires telemetry infrastructure that most teams don't have. You can't price by outcome if you can't measure outcomes. You can't implement task-tier pricing if you can't classify tasks. You can't set burst thresholds if you don't know your consumption distribution.

The minimum viable pricing infrastructure needs three components:

  • Per-user cost attribution. Track compute costs at the individual user and session level from day one. Not aggregate API spend — per-interaction cost broken down by model, token type (input vs. output vs. cached), and workflow stage. Without this, you're pricing blind.

  • Value event tracking. Define and instrument the moments where your AI delivers measurable value: a bug fixed, a question answered, a task completed, a document generated. These become the units you can eventually price against. The tracking has to be real-time, not a monthly batch job, because pricing decisions need to happen at interaction time.

  • Consumption distribution analysis. Map the full distribution of usage across your user base. Identify the natural breakpoints where usage tiers should fall. The goal is to find the threshold where fewer than 10–15% of users would experience overage charges, which keeps the pricing predictable for the majority while capturing fair value from the heaviest consumers.

The teams that build this telemetry early have a structural advantage. They can iterate on pricing models with real data instead of guessing. They can identify which features are margin-positive and which are subsidized. And they can detect the early signs of the death spiral — rising per-user costs without corresponding increases in user-perceived value — before it becomes a crisis.

The Strategic Bet

The AI pricing landscape is still early enough that the winning models haven't been decided. But the direction is clear: pure per-token billing is a transitional artifact of an era when AI products were thin wrappers around API calls. As products move toward agentic workflows, multi-model architectures, and deeply integrated AI features, the pricing has to evolve with them.

The companies that will build durable AI businesses are the ones that solve the alignment problem — not the AI alignment problem, but the pricing alignment problem. The price a user pays should scale with the value they receive, not with the compute their workflows happen to consume. Every pricing model that fails this test eventually hits the death spiral. The only question is how many tokens you burn before you figure that out.

References:Let's stay in touch and Follow me for more thoughts and updates