Six months ago, I rolled out AI coding assistants (primarily Cursor with Claude) across our 40-person engineering team at a Fortune 500 financial services company. The team was excited—finally, a tool to help us move faster through our massive modernization backlog.
The response was overwhelmingly positive. In our quarterly surveys, 85% of developers reported feeling “significantly more productive.” Stand-ups were full of stories about refactoring entire modules in an afternoon or knocking out bug fixes in minutes instead of hours. Morale was genuinely up.
But here’s where it gets uncomfortable: I decided to measure the actual impact.
We tracked cycle time (from first commit to production), code review duration, bug rates, and PR merge times for three months before AI adoption and six months after. We controlled for project complexity and team changes.
The data showed we were 19% slower.
Not a little slower. Not within margin of error. Nearly 20% slower from commit to shipped feature. And this isn’t an isolated finding—it matches exactly what the METR study found in 2025 with experienced open-source developers.
The Perception-Reality Gap
Here’s what really bothers me: Even after I shared this data with the team, most developers still insist they’re faster. The disconnect is profound:
- Team perception: “I’m 50-100% faster at writing code”
- Measured reality: 19% slower to ship features
- Team explanation: “The metrics must be wrong” or “We’re working on harder problems now”
But we controlled for complexity. The metrics aren’t wrong. Yet the team’s lived experience is that they feel dramatically more productive.
What I Think Is Happening
Looking at the detailed data, I see patterns:
- First-draft speed is real: Developers ARE faster at generating initial code
- Debugging time exploded: Time spent fixing bugs in AI-generated code is up significantly
- Review became a bottleneck: Code reviewers spending 40% more time per PR
- More iterations to “done”: Average PR has 2.3x more commits before merge
So yes, developers feel faster because they’re typing less. But they’re spending more time debugging, more time in reviews, more iterations to get to shippable quality.
The immediate feedback loop of AI code generation creates what researchers call “illusory productivity”—activity feels like progress even when it’s slowing down delivery.
The Questions I’m Wrestling With
1. Are we measuring the wrong things?
Maybe cycle time isn’t the right metric. What if the real value is in happier developers, better retention, more creative problem-solving? Should I care that we’re 19% slower if the team is 85% happier?
2. Is this a learning curve issue?
Are we just bad at using AI tools right now? Will we get better? The METR study was with experienced developers, so skill level doesn’t seem to be the issue.
3. How do we evaluate tools when users can’t accurately self-assess?
This is the Dunning-Kruger problem amplified. If developers genuinely can’t tell when a tool is helping vs. hurting, how do we make good purchasing and adoption decisions?
4. Do we trust the data or the developers?
I trust my team. They’re not lying about their experience. But the data is also clear. How do I reconcile this?
5. What’s the role of AI literacy in overconfidence?
Recent research suggests that higher AI literacy brings more overconfidence. Are we creating “hollow senior engineers” who can prompt well but lack deep problem-solving skills?
The Uncomfortable Truth
The hardest part of this isn’t the data—it’s the conversation. When I tried to discuss this with the team, I faced defensiveness, pushback, and genuine confusion. Developers feel I’m attacking their competence or their tools. But I’m not—I’m trying to understand reality so we can make informed decisions.
I’m not suggesting we ban AI tools. I use them myself. But I’m deeply concerned about making organizational decisions based on perception when reality tells a different story.
For other engineering leaders: Are you measuring AI’s actual impact? What are you finding? How are you having this conversation with your teams?
For the whole community: Is the perception gap a problem we need to solve, or is developer happiness valuable enough that we should optimize for perception even when it diverges from reality?
I’m genuinely seeking advice here. We’ve invested significantly in these tools, and I need to figure out whether we’re on the right path or if we need to fundamentally rethink our approach.
Cross-referencing research: The Productivity Paradox of AI Coding Assistants, MIT Technology Review on AI Coding