Okay, I need to share something that’s been bothering me for weeks. ![]()
I’ve been using Cursor with Claude for the past 6 months, and I swear I’ve been coding faster. The autocomplete is magical, the refactoring suggestions are spot-on, and I can scaffold entire components in minutes instead of hours. I felt like a 10x developer.
Then I read the METR study on AI’s impact on developer productivity, and my stomach dropped.
The Paradox That Broke My Brain
Developers using AI were 19% slower on average. But they believed they were 20% faster.
Let me say that again: We’re moving slower while feeling faster. ![]()
The study recruited 16 experienced developers from massive open-source repos (22k+ stars, 1M+ lines of code each) working on real issues. They could use any AI tools they wanted—Cursor Pro with Claude 3.5/3.7 Sonnet, GitHub Copilot, whatever. Frontier models, not toy examples.
And the results were brutal:
- 19% slower task completion when AI was allowed
- Developers predicted they’d be 24% faster before starting
- After finishing (slower), they still believed AI sped them up by ~20%
Why This Happens: The Dopamine Trap
I think I finally understand what’s going on. AI gives you instant feedback. You type a prompt, code drops in immediately. That loop feels like progress—the same reward you get from closing a ticket or fixing a failing test.
But dopamine rewards activity in the editor, not working code in production.
As this analysis perfectly put it: “AI can get you 70% of the way, but the last 30% is the hard part. The assistant scaffolds a feature, but production readiness means edge cases, architecture fixes, tests, and cleanup. For seniors, the last 30% is often slower than writing it clean from the start.”
That hit me hard because it’s exactly what I’ve been experiencing. I scaffold a new React component in 2 minutes with AI. Then I spend 30 minutes fixing type errors, handling edge cases, adding proper error boundaries, and writing tests. I could have written it cleanly from scratch in 20 minutes.
But Wait—GitHub Says the Opposite?
Here’s where it gets confusing. GitHub’s research shows 55% faster task completion and 78% vs 70% completion rates with Copilot.
So which is it? Are we faster or slower?
I think the difference is what we’re measuring:
- GitHub measured autocomplete acceptance and “tasks completed” (often simpler scenarios)
- METR measured real-world open-source issues (complex, production-grade work)
- GitHub optimized for “time to write code”
- METR optimized for “time to working, production-ready solution”
And here’s the kicker: Only 16.3% of developers say AI made them more productive to a great extent. The largest group—41.4%—said it had little or no effect.
My Questions for This Community
I’m genuinely conflicted. On one hand, I love the feeling of flow I get with AI assistance. On the other hand, the data suggests I might be fooling myself.
1. Are we measuring the wrong things?
Should we care about “time to write code” or “time to customer value”? What about quality metrics like bugs, security, maintainability?
2. Is the slowdown a learning curve or permanent overhead?
Maybe we just haven’t learned how to use AI effectively yet? Or is this review/cleanup tax inherent to AI-generated code?
3. How do you balance speed with quality?
Are there scenarios where AI clearly helps (boilerplate, migrations) vs. clearly hurts (complex business logic, architecture)?
4. What does this mean for learning and mastery?
If I feel productive but I’m actually slower, am I building the right skills? Am I becoming dependent on a tool that’s making me worse?
I’d love to hear from engineering leaders, CTOs, and anyone who’s thought deeply about this. Because right now, I’m questioning everything I thought I knew about AI productivity. ![]()
Sources: METR Study, Cerbos Analysis, GitHub Copilot Research, InfoWorld Survey