Skip to main content

10 posts tagged with "pricing"

View all tags

The AI Wallet: Why Token Budgets Belong in the UI, Not the Engineering Dashboard

· 10 min read
Tian Pan
Software Engineer

Pull up the per-user cost dashboard for any AI product on a flat subscription. The shape is always the same. A long, flat tail of users who barely move the needle, and a thin spike at the top where five percent of accounts burn eighty percent of the inference budget. The spike is hidden from users on both ends. The power users don't know they're subsidizing nothing — they assume the price is the price. The casual users don't know they could ask for more — they assume the limit is the limit.

The dashboard stays engineering-internal because product is afraid that exposing it will scare users. It does the opposite. The team that hides cost ends up shipping silent throttling, hidden model downgrades, and answer truncation that the user reads as "this product is broken." The team that exposes cost — as a deliberate UI surface, not an admin page — turns the same cost ceiling from a churn driver into a monetization lever.

This is the AI wallet. Not a billing page. A product primitive.

Cost-Per-Conversation as a Product Contract: When Pricing Drives Architecture

· 10 min read
Tian Pan
Software Engineer

The cleanest way to find out your AI feature's pricing model is wrong is to look at which engineer is currently rewriting the truncation logic at midnight. They aren't shipping a capability — they're patching a unit-economics leak that the PRD never named, and the patch is necessarily user-hostile because the product spec told them the budget was infinite. On a flat-fee SaaS plan, every conversation that runs longer than the median pulls margin out of the company in real time. The only real question is whether the product team admits it before finance does.

Traditional SaaS economics rest on near-zero marginal cost per user: once the software is built, serving the next customer barely moves the infrastructure line. AI features break that assumption. Every turn in a conversation consumes inference compute that scales with prompt size, output length, tool-call fan-out, and retrieval volume — and conversations don't have a natural stopping point. A heavy user can consume 50× the median in a billing period without leaving the happy path of the product. Under flat pricing, that user is funded by the rest of the user base, and the company finds out only when COGS reporting catches up a quarter later.

This is why pricing on AI features is not a finance problem to be handled after launch. It is an architecture input that decides what the product is allowed to do, and refusing to make it visible in the spec just means it gets resolved later, in worse ways, by people without product authority.

The Session Boundary Problem: Where a Conversation Ends for Billing, Eval, and Memory

· 11 min read
Tian Pan
Software Engineer

Three teams are looking at the same event stream, each with a column called session_id, and each with a different definition of what a session is. Billing inherited a 30-minute idle window from the auth library. Eval inherited "everything until the user says 'bye' or stops typing for 10 minutes" from a chatbot framework. Memory uses a thread ID that the UI generates whenever the user clicks "New chat" — which most users never do. Three columns, three semantics, one rolled-up dashboard, three unrelated bugs that share a root cause.

This is the session boundary problem. It looks like an instrumentation nit, but it is actually a product question wearing infrastructure clothes: where does a conversation end? The honest answer is that there is no single answer — a session for billing is not the same object as a session for eval is not the same object as a session for memory — and a team that picks one default and lets the other two inherit it is shipping a billing dispute, an eval bias, and a memory leak with the same root cause.

The AI Efficiency Paradox: When Your Best Feature Kills Your Revenue

· 9 min read
Tian Pan
Software Engineer

In early 2026, Atlassian reported something that hadn't happened in the company's history: a decline in enterprise seat counts. For a company whose entire growth model rests on expansion revenue — selling more seats as customer organizations grow — this was a structural alarm, not a blip. The proximate cause wasn't churn or product failure. It was that Atlassian's own AI features had made teams so much more productive that fewer seats were needed to do the same amount of work.

This is the AI efficiency paradox: build a feature that genuinely saves users time, and you may be training them to need less of your product. The more useful your AI, the faster your pricing model breaks.

AI Output Volatility Is a Business Risk You're Probably Underpricing

· 9 min read
Tian Pan
Software Engineer

When companies talk about AI risk, the conversation usually gravitates toward the obvious failures: hallucinated facts, biased outputs, legal liability from generated content. What gets far less attention is a quieter structural problem: you've made commercial commitments — pricing tiers, SLAs, customer-facing accuracy claims — on top of a system whose outputs are inherently probabilistic. Every time the model generates a response, it's sampling from a distribution. The contract doesn't mention distributions.

This is a business risk that most teams discover late, when a customer complains that the same document review workflow gave completely different results on Monday and Friday. Or when a regulator asks for reproducibility guarantees that the system architecturally cannot provide.

Token Economics for AI-Powered API Products: Pricing What You Cannot Predict

· 10 min read
Tian Pan
Software Engineer

A team ships a customer-facing AI assistant. They price it at $49/month per seat, targeting 70% gross margins based on a spreadsheet that assumed "average 500 tokens per query." Three months later, finance flags that their heaviest users are consuming 15,000 tokens per session. The pricing model collapses not because the feature failed, but because the product team priced something they didn't yet understand.

This isn't a failure of forecasting. It's a structural problem: the cost basis of an LLM-powered product is fundamentally unlike anything traditional SaaS pricing was designed to handle. Every API call has unpredictable and material token cost. The inputs vary wildly by user, task, and time of day. The outputs compound in ways that only show up weeks later on your cloud bill. And once you layer in agentic patterns — tool calls, multi-turn reasoning, subagent orchestration — a single user interaction can cost $0.02 or $20 depending on what the model decides to do.

Your AI Pricing Page Is a Leveraged Bet on Token Economics

· 9 min read
Tian Pan
Software Engineer

When the team published the AI tier at "$X per seat for unlimited AI," nobody on the pricing call thought of it as a derivative position. It looked like a SaaS pricing page — a number, a tier, a CTA. But every dollar of revenue from that page is now exposed to a token-cost curve set by a vendor whose roadmap does not care about your gross margin. You did not write a pricing page. You wrote a naked short on token volatility, and the strike is whatever your vendor charges next quarter.

The math arrives quickly. A handful of power users discover the workflow and start running it on the longest context they can fit. A competitor's UX change re-trains the median user to send queries that are 40% longer. The frontier model your feature is locked to gets a price-per-million bump because the older tier you were on is being deprecated. Any one of these is a margin event you cannot reverse from the pricing page in a single quarter — and they tend to arrive together.

Pricing AI Features: The Unit Economics Framework Engineering Teams Always Skip

· 11 min read
Tian Pan
Software Engineer

Cursor hit 1billioninrevenuein2025andlost1 billion in revenue in 2025 and lost 150 million doing it. Every dollar customers paid went straight to LLM API providers, with nothing left for engineering, support, or infrastructure overhead. This wasn't a scaling problem—it was a unit economics problem that was invisible until it was catastrophic.

Most engineering teams building AI features make the same mistake: they treat inference cost as a minor line item, ship a flat-rate subscription, and assume the economics will work out later. They don't. Variable inference costs don't behave like any other COGS in software, and the pricing architectures that work for traditional SaaS will bleed you dry the moment your heaviest users find your most expensive feature.

Pricing Your AI Product: Escaping the Compute Cost Trap

· 10 min read
Tian Pan
Software Engineer

There is a company charging £50 per month per user. Their AI feature consumes £30 in API fees. That leaves £20 to cover hosting, support, and profit — before accounting for a single refund or churned seat. They built a product users love, grew to thousands of subscribers, and unknowingly constructed a business where more customers means more losses.

This is not a cautionary tale about a bad idea. It is a cautionary tale about a pricing architecture imported from a world where the marginal cost of serving the next user was effectively zero. That world no longer fully applies when your product calls a language model.

Traditional SaaS gross margins run 70–90%. AI-forward companies are reporting 50–60% — and the gap is mostly explained by one line item: inference. When tokens are 20–40% of your cost of goods sold, the standard SaaS playbook inverts.

The Metered AI Pricing Death Spiral: Why Per-Token Billing Punishes Your Best Features

· 8 min read
Tian Pan
Software Engineer

Token costs dropped 280x in two years. Enterprise AI bills went up 320%. If that sounds like a paradox, you haven't looked closely at how per-token billing interacts with the features that actually make AI products valuable.

The most useful AI workflows — deep research, multi-step reasoning, iterative refinement, agentic tool use — are precisely the ones that consume the most tokens. Under pure usage-based pricing, your best features are your worst margin killers. This isn't a temporary scaling problem. It's a structural misalignment between how AI creates value and how it gets billed.