Three months ago, I rolled out AI coding assistants across my team of 40+ engineers. The feedback was immediate and overwhelmingly positive: “This is a game-changer,” “I’m shipping so much faster,” “Can’t imagine going back.”
But here’s what’s keeping me up at night: our sprint velocity hasn’t budged. Not even a little.
The Numbers Don’t Add Up
The research is clear—developers save approximately 3.6 hours per week using AI coding tools. That’s 187 hours per year, per developer. For my team, that should translate to 7,480 hours of recovered productivity annually. According to Faros.ai, over 75% of engineers are now using these tools.
So where did those 7,480 hours go?
What I Found When I Looked Closer
I spent the last two weeks diving into our engineering metrics. Here’s what the data showed:
Pull Requests: ↑ 98% more PRs opened (consistent with Index.dev research)
Review Time: ↑ 91% increase in time spent on code reviews
Quality Incidents: ↑ 12% increase in bugs caught in QA
Deployment Frequency: → Completely flat
Sprint Velocity: → Also flat
We didn’t get slower. But we definitely didn’t get faster.
The Bottleneck Just Moved
Here’s what’s actually happening on my team:
Junior Engineer Story: Last week, one of our junior devs used an AI assistant to implement an OAuth2 authentication flow. The AI generated clean, working code in 20 minutes—something that would’ve taken her 3-4 hours before. Great, right?
Except she then spent 2 hours debugging a subtle security issue in the AI-generated code because she didn’t fully understand OAuth2 flows yet. The AI had used a deprecated grant type that passed our automated tests but would’ve failed a security audit.
Net result: Still faster than before. But not 12x faster like the 20-minute generation would suggest.
Senior Engineer Story: My tech leads are now reviewing 3x more PRs, but with less context about each one. They’re spending their “saved” time being code reviewers instead of architects. One told me: “I feel like I’m debugging code I didn’t write, by engineers who don’t fully understand what they’re shipping.”
The “Almost Right” Problem
Here’s what I think is happening: AI coding assistants are incredible at generating code that’s 85-90% correct. That last 10-15%—understanding edge cases, aligning with our architectural patterns, considering security implications—still requires deep human judgment.
And paradoxically, reviewing “almost right” code is cognitively harder than reviewing obviously wrong code or writing from scratch. ShiftMag reports that 93% of developers use AI, but productivity gains are stuck at around 10%.
The Measurement Problem
Maybe I’m measuring the wrong things. Individual velocity? Team throughput? Business outcomes?
My team feels more productive. Morale is high. Nobody wants to give up their AI tools. But our delivery cadence to customers hasn’t changed.
Is this an adjustment period while we learn to work with AI effectively? Are there systemic changes I need to make to our development process to actually capture these gains? Or is 3.6 hours per week the real gain, and we need to adjust our expectations?
Questions for the Community
I’m especially curious to hear from other engineering leaders:
- Are you seeing similar patterns? More output but similar delivery?
- How are you measuring productivity? Have you changed your metrics since AI adoption?
- What systemic changes worked? Did you have to redesign your code review process, testing strategy, or deployment pipeline?
- The junior engineer paradox: How do you balance AI acceleration with learning and skill development?
I’m not suggesting AI tools aren’t valuable—my team would revolt if I took them away. But I need to understand this productivity paradox better. The 3.6 hours are going somewhere. I just need to figure out where.
What are you seeing in your organizations?