Your AI Pricing Page Is a Leveraged Bet on Token Economics

April 28, 2026 · 9 min read

Software Engineer

When the team published the AI tier at "$X per seat for unlimited AI," nobody on the pricing call thought of it as a derivative position. It looked like a SaaS pricing page — a number, a tier, a CTA. But every dollar of revenue from that page is now exposed to a token-cost curve set by a vendor whose roadmap does not care about your gross margin. You did not write a pricing page. You wrote a naked short on token volatility, and the strike is whatever your vendor charges next quarter.

The math arrives quickly. A handful of power users discover the workflow and start running it on the longest context they can fit. A competitor's UX change re-trains the median user to send queries that are 40% longer. The frontier model your feature is locked to gets a price-per-million bump because the older tier you were on is being deprecated. Any one of these is a margin event you cannot reverse from the pricing page in a single quarter — and they tend to arrive together.

This is not a hypothetical. GitHub Copilot has spent 2026 unwinding the consequences of "$10/month for an in-editor assistant" after the same product became an agentic loop running multi-step coding sessions, and is now moving every plan to usage-based billing because the flat-rate model could not survive what the product became. Cursor went through its version of the same fire in June 2025: the unlimited Pro tier was converted into "$20 worth of usage," power users hit the new ceiling within a few prompts of Claude, and the company spent the following weekend issuing refunds and an apology. Both teams ship excellent products. Neither team was outrun by competition. They were outrun by their own pricing pages.

The Per-Seat Page Was Designed for a Different Cost Curve

Per-seat SaaS pricing works because the marginal cost of a seat is approximately zero. A second user on the same Postgres-backed CRUD application runs the same query the first user ran, against the same database, on hardware that is already paid for. The unit economics survive 10x usage variance per user because there is no usage-shaped variable cost to absorb.

AI features break that assumption at the root. A second user on the same chat surface runs a fresh inference, on a fresh context window, against a model whose per-token price is set by a vendor you do not own. The "second seat is free" intuition silently inverts: the second seat is, in expectation, exactly as expensive as the first, and a P90 power-user seat is routinely 10x more expensive than a P50 seat. The result is the gross-margin compression every AI-first analyst has been writing about for two years — AI-native SaaS lands in the 50–60% range while traditional SaaS sits at 80–90% — but expressed as a structural property of the pricing surface, not a cost-control problem.

What looks like "we have a few power users" is actually "our unit economics are a function of the usage distribution we cannot observe from the pricing page alone." The flat unlimited tier hides the distribution; the distribution is what determines whether you make money.

The Three Shocks That Compress the Margin Overnight

Three things move the cost curve, and they arrive on different timelines than your pricing page can react to.

Vendor price changes. When the model your feature is built on gets repriced — either upward, or because the older cheap tier sunsets and you are forced onto a more expensive successor — the marginal cost of every existing customer jumps. The decision is not yours, and the announcement-to-invoice gap is measured in weeks. Your pricing page, by contrast, is contractual to existing customers and takes a sales motion to move on enterprise tiers.

Workload drift in the user base. A UX change a competitor ships can re-train your users about what an AI query looks like. Longer prompts, multi-turn chains where users used to send one shot, agent loops that fan out into tool calls — these are exogenous shocks to your token consumption. The user has not become more valuable; they have become more expensive at the same value.

Model-mix drift inside your own stack. The PM who wants better answers swaps a small model for a frontier one. The engineer who wants better recall doubles the retrieved context. The new feature that adds tool use turns one inference into eight. Each of these is a defensible product decision; collectively they are a token-cost ramp that nobody owns on a P&L line.

The team that watches one of these will be ambushed by the other two. The pricing-page-as-derivative framing is what makes this legible: you are short volatility on token costs, and there are at least three independent volatility sources.

The Discipline That Has to Land Before the Pricing Page Goes Live

Treating the AI pricing page as a financial position changes what has to be in place before it ships. Four artifacts, in particular, are non-negotiable.

A per-customer cost model that finance can read. The per-inference cost has to be tagged at the request layer — by customer, by feature, by model, by context length — and rolled up into a dashboard that the CFO and the head of engineering can read on the same screen. The CloudZero framing of "AI attribution" is precise: when a model call lands in a shared compute pool with no tagging, the COGS is mathematically real but operationally invisible. You cannot price what you cannot attribute. A workable rule of thumb: if your finance team cannot point at a customer ID and a dollar amount within 24 hours of close, the pricing page is flying blind.

A usage-cap or fair-use clause designed in from day one. The retrofitting of a cap into an existing "unlimited" plan is the single most expensive maneuver in AI pricing — Cursor's mid-year conversion produced refunds, a public apology, and a load-bearing trust hit. The cap does not have to be visible to the casual user; it has to exist in the terms of service and in the rate-limiter from the first day the page goes live. Industry references for what a fair-use ceiling looks like in token terms — for example, a few million tokens per user per month with a defined overage rate — already exist; nothing forces you to lead with the number, but the mechanism has to be there.

A pricing-tier gradient that maps to actual cost cohorts. A flat "unlimited" promise compresses three cost cohorts — light, median, power — into one ARPU. A two-tier pricing page does no better if both tiers are flat. The page should reflect the cost structure underneath: a base subscription for predictability, plus a metered component (credits, tokens, or outcomes) that captures the upside from heavy users. The hybrid model — base seats plus a usage layer — is now the modal pricing structure in AI-augmented SaaS for exactly this reason: it is the simplest pricing page that does not lie about the cost curve underneath.

A quarterly margin-vs-vendor-cost review. The model bill is not a COGS detail to be reconciled by accounts payable. It is a strategic line item whose unit cost moves on the vendor's quarterly cadence, and the team that ships AI features should be reviewing it on the same cadence. Two questions belong on that review: what fraction of the revenue from the AI tier was paid back to the model vendor, and what would it have been if the median customer's token usage had grown 30%? If the answer to the second question is "we lose money," the pricing page is structurally fragile and the next vendor announcement is the trigger.

Why Outcome-Based and Hybrid Pricing Are Spreading

The market has been visibly shifting away from flat per-seat AI pricing toward hybrid and outcome-based models — Zendesk billing AI agents per resolved ticket, Salesforce charging per Agentforce conversation, the broader move to credits — and the standard explanation is "AI agents replace seats, so seats are obsolete as a billing unit." That is true but downstream. The deeper reason is the volatility argument: seats are a flat-rate exposure to a usage-shaped cost curve, and flat-rate exposures to volatile underlyings are exactly the positions that blow up first.

Outcome-based pricing — per resolution, per conversation, per generated artifact — has the property that the revenue scales with whatever you are paying the model for. Each unit of revenue is naturally hedged against token-cost variance because the unit of revenue is a unit of work that consumed tokens. Hybrid pricing achieves a similar effect more crudely: a base subscription for predictability and operational simplicity, a usage layer that absorbs the variance. Both are improvements over per-seat unlimited not because the new shape is more elegant but because they convert a leveraged short into something closer to a matched book.

This is not an argument that every AI feature should be priced per outcome. It is an argument that whatever pricing shape you ship, you should be able to articulate which token-cost shocks it absorbs and which ones it transmits straight through to the gross margin. "Unlimited" answers that question with "all of them, into us."

Pricing as an Engineering Concern, Not Just a GTM Concern

The framing that finally makes this tractable is: the AI pricing page is an engineering artifact. It is a contract between a usage distribution your code emits, a cost curve your vendor controls, and a revenue line that goes to the board. Every team that has had to do an emergency repricing in the last 18 months — Copilot, Cursor, several quieter examples — discovered that the pricing page could not be repaired from the marketing side alone. It required engineering instrumentation (per-customer attribution, rate limiters, tier enforcement), product surgery (introducing caps, gating frontier models, redesigning the agentic loop to respect a budget), and finance modeling (re-running the unit economics on the new vendor price sheet) — all moving in lockstep, under the time pressure of an already-charging customer base.

The team that prices AI features like classical SaaS is short volatility on a market it cannot hedge. The team that treats the pricing page as a position to manage — instrumented, capped, gradient-priced, reviewed quarterly — is running a business. The board reads the same number either way; only one of the two readings is durable through the next vendor pricing email.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Your AI Pricing Page Is a Leveraged Bet on Token Economics

The Per-Seat Page Was Designed for a Different Cost Curve

The Three Shocks That Compress the Margin Overnight

The Discipline That Has to Land Before the Pricing Page Goes Live

Why Outcome-Based and Hybrid Pricing Are Spreading

Pricing as an Engineering Concern, Not Just a GTM Concern

Recommended Reading

About Tian Pan

The Per-Seat Page Was Designed for a Different Cost Curve​

The Three Shocks That Compress the Margin Overnight​

The Discipline That Has to Land Before the Pricing Page Goes Live​

Why Outcome-Based and Hybrid Pricing Are Spreading​

Pricing as an Engineering Concern, Not Just a GTM Concern​

Recommended Reading

About Tian Pan

The Per-Seat Page Was Designed for a Different Cost Curve

The Three Shocks That Compress the Margin Overnight

The Discipline That Has to Land Before the Pricing Page Goes Live

Why Outcome-Based and Hybrid Pricing Are Spreading

Pricing as an Engineering Concern, Not Just a GTM Concern