One post tagged with "measurement"

Why A/B Tests Fail for AI Features (And What to Use Instead)

April 15, 2026 · 9 min read

Software Engineer

Your AI feature shipped. The A/B test ran for two weeks. The treatment group looks better — 4% lift in engagement, p-value under 0.05. You ship it to everyone.

Six weeks later, the gains have evaporated. Engagement is back where it started, or lower. Your experiment said one thing; reality said another.

This is not a corner case. It is the default outcome when you apply standard two-sample A/B testing to AI-powered features without accounting for the ways these features break the assumptions baked into that methodology. The failure modes are structural, not statistical — you can run your experiment perfectly by the textbook and still get a wrong answer.

About Tian Pan