The Tip Jar Problem: When 5% of Your Users Burn 80% of Your Inference Budget
A single developer ran up more than $35,000 in compute under a $200 monthly plan. That is a 175x subsidy on one user — paid for by the casual majority who would have been just as happy on a $19 tier. This is the load-bearing math behind every "Why is our AI margin negative this quarter?" Slack thread. The problem is not that one user; it is that the long tail of one users follows a power law, and a power law plus flat-rate billing plus a real per-unit cost is a structural margin compressor that no amount of growth will fix.
The reflex when this lands on a finance review is to clamp down: hard token caps, "fair-use" language buried in the TOS, weekly throttles, a quietly degraded model for free tier. These all work in the sense that they cut the bleed. They also alienate the exact users whose evangelism you depend on, because the people who hit your caps are the ones who actually figured out how to extract value from your product. The standard fix is a backwards-compatible apology to the wrong cohort.
