Skip to main content

4 posts tagged with "metrics"

View all tags

Accept Rate Is a Vanity Metric: Your Copilot ROI Hides in the 90 Seconds After the Keystroke

· 11 min read
Tian Pan
Software Engineer

The dashboard says your engineers accepted 45% of AI suggestions last quarter. Leadership reads that as "45% of a developer's time saved" and signs the renewal. The engineers, meanwhile, are quietly rewriting half of what they accepted, debugging the other half, and wondering why their sprints still feel the same length. Both sides are looking at the same number. Only one of them is looking at the right number.

The most quoted study of 2025 should have ended the vendor-dashboard era on its own. METR measured experienced open-source maintainers working on real issues in their own repos, with and without AI. The developers predicted AI would speed them up by 24%. After the experiment they still believed AI had sped them up by 20%. The stopwatch said they were 19% slower. A thirty-nine-point gap between the story and the data — and the story is what went into the quarterly review.

Why Hallucination Rate Is the Wrong Primary Metric for Production LLM Systems

· 8 min read
Tian Pan
Software Engineer

Your LLM's hallucination rate is 3%. Your users hate it anyway. This isn't a contradiction — it's a symptom of measuring the wrong thing.

Hallucination rate has become the default headline metric for LLM quality because it's easy to explain to stakeholders and straightforward to compute on a benchmark. But in production, it correlates poorly with what users actually care about: did the task get done, was the result trustworthy enough to act on, and did the system save them time?

AI Product Metrics That Don't Lie: Behavioral Signals Over Thumbs-Up Scores

· 9 min read
Tian Pan
Software Engineer

Your AI feature has a 4.2/5 satisfaction score. Users click thumbs-up 68% of the time. The A/B test shows task completion rate is up 12%. Your team ships it. Six weeks later, users have quietly routed around it for anything they actually care about.

This is metric theater. You optimized for signals that look like success but aren't. The feedback you collected came from the 8% of users who bother rating anything — skewed toward the delighted and the furious, silent on the vast middle who found the feature unreliable just often enough to stop trusting it.

Building AI features requires a different measurement philosophy than traditional software. The signals you instrument from day one determine whether you learn fast enough to improve or spend six months chasing a satisfaction score that doesn't move.

Measuring Real AI Coding Productivity: The Metrics That Survive the 90-Day Lag

· 9 min read
Tian Pan
Software Engineer

Most teams adopting AI coding tools hit the same wall. Month one looks like a success story: PR throughput is up, sprint velocity is climbing, and the engineering manager is putting together a slide deck to share with leadership. By month three, something has quietly gone wrong. Incidents creep up. Senior engineers are spending more time in review. A simple bug fix now requires understanding code nobody on the team actually wrote. The productivity gains have evaporated — but the measurement system never caught it.

The problem is that the metrics most teams reach for first — lines generated, PRs merged, story points burned — are the wrong unit of measurement for AI-assisted development. They measure the cost of producing code, not the cost of owning it. And AI has made production nearly free while leaving ownership costs untouched.