Pricing AI Features: The Unit Economics Framework Engineering Teams Always Skip

April 17, 2026 · 11 min read

Software Engineer

Cursor hit $1 billion in revenue in 2025 and lost$ 150 million doing it. Every dollar customers paid went straight to LLM API providers, with nothing left for engineering, support, or infrastructure overhead. This wasn't a scaling problem—it was a unit economics problem that was invisible until it was catastrophic.

Most engineering teams building AI features make the same mistake: they treat inference cost as a minor line item, ship a flat-rate subscription, and assume the economics will work out later. They don't. Variable inference costs don't behave like any other COGS in software, and the pricing architectures that work for traditional SaaS will bleed you dry the moment your heaviest users find your most expensive feature.

This is the framework for getting it right before production, not after the margin crisis.

Why Variable Inference Cost Breaks SaaS Assumptions

Traditional SaaS pricing rests on a simple premise: your marginal cost per additional user is close to zero. Hosting is cheap, bandwidth is cheap, database reads are cheap. You price on value, not cost, and as volume grows, your gross margins expand.

AI inference inverts this. Every API call has a real, variable cost that scales directly with user behavior. A chat feature might cost $0.005 per query using a mid-tier model. Sounds negligible—until you calculate what happens at 100,000 monthly active users averaging 20 queries each. That's$ 10,000 in inference costs per month from a single feature, before you add infrastructure overhead, fallback models, observability tooling, and retry logic.

The cost multiplier from pilot to production consistently surprises teams. A feature that costs $0.50 per call in testing typically lands at$ 3–5 per call in production once you account for: error retries, output validation loops, context padding from conversation history, and the observability stack needed to debug it. Teams that price based on pilot benchmarks discover this the hard way.

The situation gets more extreme with agentic workflows. A simple single-call inference might cost $0.02. The same task done through an agent that self-corrects, calls tools, and validates its own output can cost$ 0.50– $2.00—a 25–100x multiplier. If you've priced a subscription at$ 20/month assuming single-call costs, one agentic power user can consume your entire monthly revenue in a few hours.

Building the Per-Workflow Cost Model

The antidote is cost modeling at the workflow level, not the API level. Before shipping any AI feature, you need a cost sheet that answers three questions: what does one activation cost, what does the 90th-percentile activation cost, and what happens when a user runs it 500 times a day?

Start with the four cost axes for every workflow:

Model selection is where the biggest lever lives. Modern LLMs span a 100x price range. A classification task that determines customer intent doesn't need the same model as a complex multi-step reasoning task. Routing simple operations to a budget model (Claude Haiku, GPT-4o mini) and reserving premium models for tasks that genuinely require them can cut average inference cost by 60–80% with negligible quality impact.

Token management is the second lever. Input tokens are cheaper than output tokens—typically by a factor of 4–5x. Every token you can eliminate from your prompts without sacrificing quality is a direct cost reduction. Common culprits: bloated system prompts with redundant instructions, unnecessary conversation history padding, and RAG retrievals that pull in far more context than the model actually uses.

Prompt caching is underused and high-return. When your system prompt and injected documents remain constant across many calls, cached tokens cost 10–15% of standard input prices. Teams running large document analysis pipelines have cut LLM costs by 50–90% through caching alone, simply by structuring prompts so the static content appears before the dynamic query.

Batching offers a flat 50% discount from both major API providers for non-real-time workloads. Document processing, data enrichment, background summarization—any task that doesn't need an immediate synchronous response can go through the batch API and immediately halve its cost.

The output of your cost model should be: median cost per workflow activation, 90th percentile cost, and a daily cost cap per user that, if exceeded, signals an anomaly worth investigating.

The Heavy-User Subsidy Problem

Here's the math that breaks flat-rate AI subscriptions:

Assume you offer an AI writing assistant at $20/month. Your user base splits roughly into three groups:

Light users (80% of customers): 5–10 queries per day, $1–2/month in actual inference costs
Regular users (18%): 50 queries per day, $15–20/month in inference costs
Power users (2%): 300–500 queries per day, $100–200/month in inference costs

At a typical AI SaaS distribution, the top 20% of users consume 80% of your compute. The top 1–2% can account for 40–50% of total inference costs while paying the same $20/month as everyone else.

Light users don't cross-subsidize power users in traditional SaaS because the marginal cost is negligible. In AI, they explicitly subsidize them dollar for dollar. At 1,000 customers: 800 light users generate ~ $1,600 in inference COGS, while 20 power users generate ~$ 3,000–4,000. Revenue: $20,000. Inference COGS: ~$ 5,600 plus infrastructure multiplier of 2x = ~$11,200. Gross margin: ~44%. Acceptable, but only if you've modeled it.

Now consider what happens when your product gains traction and the power user ratio shifts from 2% to 5%. Same subscription price, same feature set—but COGS as a percentage of revenue jumps dramatically. Many teams discover this shift only after margins turn negative.

The fix is identifying power users early and designing your pricing to either capture their value or gate their usage. Track cost-per-user weekly. Flag any account exceeding 2x average inference cost for their tier. If your top 10 users are consuming 50x the median, you have a subsidy problem that will only grow.

Consumption Cap Design: Soft, Medium, Hard

Unlimited AI features are a liability, not a differentiator—unless you've explicitly modeled and priced the cost of unlimited.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Pricing AI Features: The Unit Economics Framework Engineering Teams Always Skip

Why Variable Inference Cost Breaks SaaS Assumptions

Building the Per-Workflow Cost Model

The Heavy-User Subsidy Problem

Consumption Cap Design: Soft, Medium, Hard

Recommended Reading

About Tian Pan

Why Variable Inference Cost Breaks SaaS Assumptions​

Building the Per-Workflow Cost Model​

The Heavy-User Subsidy Problem​

Consumption Cap Design: Soft, Medium, Hard​

Recommended Reading

About Tian Pan

Why Variable Inference Cost Breaks SaaS Assumptions

Building the Per-Workflow Cost Model

The Heavy-User Subsidy Problem

Consumption Cap Design: Soft, Medium, Hard