Free-Tier Traffic Is Your Real Eval Set
The team optimizing the model against paid-cohort traces is grading itself on the easy distribution. Paying users have a workflow. They self-selected into the product because something about it justified pulling out a credit card, which means by the time they're in the eval set, they've already learned which prompts work, which features deliver, and which corners not to wander into. Free-tier users do none of that. They're anonymous, exploratory, often adversarial, often non-native English speakers stress-testing a product in their second language, and they exercise the long tail of failure modes the eval set was never built to cover.
This is the asymmetry that quietly eats the conversion funnel of every freemium AI product. The team grades the model against a curated sample drawn disproportionately from paid traces. The free-tier weird traces — the ones with no template, the ones where someone is genuinely trying to figure out what the product does — never get labeled, never get a regression test, and never inform the next prompt edit. The model gets better against the paid distribution and slowly worse against the distribution that decides whether free users ever upgrade.
