Pricing Your AI Product: Escaping the Compute Cost Trap
There is a company charging £50 per month per user. Their AI feature consumes £30 in API fees. That leaves £20 to cover hosting, support, and profit — before accounting for a single refund or churned seat. They built a product users love, grew to thousands of subscribers, and unknowingly constructed a business where more customers means more losses.
This is not a cautionary tale about a bad idea. It is a cautionary tale about a pricing architecture imported from a world where the marginal cost of serving the next user was effectively zero. That world no longer fully applies when your product calls a language model.
Traditional SaaS gross margins run 70–90%. AI-forward companies are reporting 50–60% — and the gap is mostly explained by one line item: inference. When tokens are 20–40% of your cost of goods sold, the standard SaaS playbook inverts.
Why Flat-Rate Pricing Breaks Under Token Pressure
The economics of traditional software are beautiful in their simplicity. Once you pay for the servers, each additional user costs nearly nothing. Pricing in that world is a question of willingness to pay and competitive positioning — cost is a rounding error.
LLM-powered features do not have this property. Every query triggers a real API call. A user who asks 400 questions a month costs twice as much to serve as a user who asks 200 — and that ratio does not compress as you scale. It compounds.
Consider a product with 1 per million tokens) costs 5 per million tokens) cost $10.00 — half of the monthly revenue, before a single server is provisioned or support ticket answered.
The dangerous part is that these users often look like your best customers. They engage the most, use every feature, and generate the testimonials that drive growth. They are also, quietly, the users with the worst unit economics.
OpenAI discovered this directly with ChatGPT Pro. Even at $200 per month — the highest consumer AI subscription price on the market — the plan was losing money on a segment of users hitting 20,000+ queries monthly. The price that looked premium was still insufficient when usage was unconstrained.
The Four Pricing Architectures, and When Each Fails
Teams navigating AI pricing tend to converge on one of four patterns, each with a specific failure mode.
Bundled flat fee — AI features are included in existing tiers at no explicit charge. Simplest to ship, fastest to adopt. The failure mode is invisible: if usage spikes, margin compresses silently. You won't see it in acquisition metrics. You'll see it in the quarterly finance review when gross margin is ten points lower than forecast.
Tiered usage caps — each tier includes an allowance (1M tokens per month, 50 chat messages, 2,000 code completions) and power users hit a ceiling or upgrade. GitHub Copilot Free offers exactly this: 2,000 completions and 50 chat messages, then stops. This is the most widely deployed approach because it segments users economically — the 5% who drive 80% of token spend are also likely the 5% most willing to pay for more. The failure mode is churn at the ceiling. If your cap is set too aggressively, you frustrate the power users who advocate for you most loudly.
Metered overage — a base subscription covers a token budget, and users pay a per-unit rate beyond that threshold. Example: 2 per 100K thereafter. The failure mode is surprise billing. Users who don't monitor their usage receive unexpectedly large invoices and churn or dispute them. This architecture requires robust usage dashboards and proactive alerts to work.
Outcome-based pricing — charge for resolved tickets, completed documents, closed deals, or other downstream outcomes rather than for token consumption. The failure mode is attribution complexity: you need a clear, defensible definition of what counts as an outcome, and customers will find edge cases.
None of these architectures is universally correct. Fifty-six percent of AI SaaS companies now use hybrid models that blend subscription predictability with some form of usage signal.
Margin-Defense Patterns That Actually Work
The goal is not to punish power users — they are your best advocates. The goal is to avoid cross-subsidizing them with revenue from users who barely touch the AI feature.
- https://www.drivetrain.ai/post/unit-economics-of-ai-saas-companies-cfo-guide-for-managing-token-based-costs-and-margins
- https://www.getmonetizely.com/articles/how-to-price-ai-services-in-2025-models-examples-and-strategy-for-saas-leaders
- https://stripe.com/blog/a-framework-for-pricing-ai-products
- https://epoch.ai/data-insights/llm-inference-price-trends/
- https://www.bvp.com/atlas/the-ai-pricing-and-monetization-playbook
- https://www.chargebee.com/blog/how-intercom-built-its-outcome-based-pricing-model-for-ai/
- https://paid.ai/blog/ai-monetization/usage-based-pricing-for-saas-what-it-is-and-how-ai-agents-are-breaking-it
- https://www.reforge.com/blog/how-to-price-your-ai-product
