The Token Budget That Ran Out Mid-Conversation: Why Free-Tier Users Think Your Model Got Dumber
A product manager I know spent two weeks triaging a churn spike on her company's AI writing assistant. Free-tier session length had collapsed by 30%, the support inbox filled up with variations of "your model used to be smart, now it's lazy," and the team's first instinct was to blame a model upgrade that had shipped the same week. The model had not changed. What had changed was that finance had quietly tightened the per-user token budget mid-quarter, and the app had been silently truncating system prompts, dropping tool calls, and shortening responses for any user who crossed the new threshold. From the user's seat, the AI had degraded. From the dashboard, nothing was wrong. Both were true, and that is the failure mode.
This pattern is everywhere now. ChatGPT's free tier drops to a smaller model when the limit is hit, with no in-product label other than "responses may be shorter for a while." Anthropic's free tier behaves similarly. Build a feature on top of either, layer on your own per-user budget for cost control, and you have stacked two invisible cliffs in series — the platform's and yours — and the user, who only sees one chat box, has no way to tell which one they just walked off.
