I just read the METR research study and I’m experiencing serious cognitive dissonance.
Here’s the headline that stopped me cold: Experienced developers using AI coding tools were 19% slower at completing tasks. But here’s the kicker—those same developers believed they were 20% faster.
That’s a 39-point perception gap between what we think is happening and what’s actually happening.
My Own Experience
I’ll be honest: I feel more productive with AI tools. That dopamine hit when Copilot autocompletes exactly what I was thinking? The satisfaction of having a back-and-forth with Claude to solve a gnarly bug? It feels like I’m crushing it.
But the METR data suggests I might be deluding myself. And I’m not alone—the study participants predicted a 24% speedup before using AI, and even after experiencing the actual slowdown, they still estimated a 20% improvement.
Why the Disconnect?
The METR researchers have a theory that resonates uncomfortably:
“Developers overestimate speed-up because it’s so much fun to use AI. We sit and work on these long bugs, and then eventually AI will solve the bug. But we don’t focus on all the time we actually spent—we just focus on how it was more enjoyable.”
The enjoyment is masking the reality. We’re measuring how good it feels instead of how fast we actually shipped.
The Measurement Problem
This raises bigger questions for me:
-
Are we measuring the wrong things? Individual velocity (PRs merged, lines written) vs team outcomes (features shipped, customer value)?
-
Does the enjoyment premium matter? If devs are happier and more engaged (better retention, less burnout), does that offset the slowdown?
-
Is this an AI problem or an org problem? Maybe the 19% individual slowdown reveals 19% of waste in our existing processes that AI makes visible?
-
What about 2026 tools? METR’s newer data shows only -4% slowdown (vs -19% in early 2025). Are we just early in the adoption curve?
The Context That Matters
To be fair, the study focused on experienced developers working on complex open-source issues. That’s different from:
- Junior devs learning (where AI might genuinely accelerate)
- Routine CRUD work (where autocomplete probably helps)
- Exploratory prototyping (where conversation might unlock ideas)
But it’s similar to the complex, ambiguous work most of us do day-to-day.
What I’m Struggling With
If I can’t trust my own perception of productivity, what can I trust?
Do I ignore the data and keep using tools that feel good? Do I time-track everything to objective-truth my way out of delusion? Do I focus on team-level outcomes and stop worrying about individual velocity?
How does your team actually measure AI productivity impact? And more importantly—do you trust those measurements?
Asking because I genuinely don’t know if we’re all collectively fooling ourselves or if we’re just measuring the wrong things.
Sources: METR Research Study, MIT Tech Review, GetDX Analysis