I just read the METR study that dropped in mid-2025 and I’m honestly floored by the numbers. Before using AI coding tools, developers predicted they’d be 24% faster. The actual result? 19% slower. And here’s the kicker—even after experiencing that slowdown, they still believed they were 20% faster.
That’s a 39-percentage-point gap between perception and reality. How do we make tool decisions when the people using them can’t accurately assess their own productivity?
Why This Matters for Product Leaders
I’m coming at this from the product side, not engineering, but this has massive implications for how we evaluate and justify AI tool spend:
1. Self-reported productivity is unreliable
Developers feel faster because they complete individual coding tasks more quickly. But they’re discounting the expanded time spent debugging, reviewing, and validating AI-generated code. The METR study found this pattern across 16 experienced developers working on real issues in their own repositories—not synthetic benchmarks.
2. CFOs are demanding ROI proof, and we don’t have it
Only 29% of executives can confidently measure AI ROI. Meanwhile, 56% of CEOs report zero measurable ROI from AI investments in the past 12 months. That’s not sustainable when we’re asking for budget increases for these tools.
3. The measurement gap creates strategic risk
When developers love a tool but it doesn’t improve team-level metrics (DORA, cycle time, throughput), what do we do? Some teams report 40% individual efficiency gains but see no improvement in delivery metrics. That disconnect matters.
The Context Problem Nobody’s Talking About
The METR researchers believe the slowdown happens because experienced developers have tons of project context that AI assistants don’t have. So developers spend time retrofitting their agenda and problem-solving strategies into the AI’s outputs, then debugging those outputs extensively.
It’s like having a very fast junior developer who doesn’t understand the broader architecture or product goals. The code ships faster, but the total cycle time increases.
What Should We Actually Measure?
Here’s where I’m struggling: traditional developer productivity metrics (DORA, cycle time, deployment frequency) aren’t showing the gains that developers swear they’re experiencing. But developer happiness and tool adoption are off the charts.
Should we:
- Trust the subjective experience and invest more in tools developers love?
- Trust the objective metrics and question whether these tools actually help?
- Develop new frameworks that capture something neither self-reports nor DORA metrics are measuring?
I’m genuinely curious how other product and engineering leaders are navigating this. Are you measuring AI tool impact? What metrics are you using? How do you reconcile the perception gap with business outcomes?
Sources: