98% More Pull Requests But 91% Longer Review Times: The AI Productivity Paradox Hits Code Review. Are We Just Moving the Bottleneck?
Nine months ago, I championed our company-wide adoption of AI coding assistants. The business case was compelling: 40% faster feature delivery, reduce time-to-market, stay competitive. We’re a 120-person engineering org at a mid-stage SaaS company, and every tech leader I knew was racing to deploy AI coding tools.
The initial results looked amazing. Our developers felt 20% faster. PR velocity jumped 98% in the first quarter. We were shipping features at a pace we’d never seen before. I presented these numbers to the board, and they were thrilled.
But nine months in, I’m looking at data that tells a very different story—and I’m questioning whether we’ve actually gained productivity or just moved the bottleneck.
The Numbers That Keep Me Up at Night
Here’s what our engineering metrics show after 9 months of AI adoption:
- Pull requests merged: +98% (from ~250/week to ~495/week)
- PR review time: +91% average (from 18 hours to 34 hours)
- Time to first review: +127% (reviewers now prioritize human-written code)
- Lines of code per PR: +65% (AI PRs are significantly larger)
- Bug rate in production: +9%
- Senior engineer time spent on code review: 22-25 hours/week (up from 12-15)
The math doesn’t work. We’re generating code 98% faster, but our human review capacity hasn’t scaled. We still have the same number of senior engineers, and they still have the same 40 hours in a week.
The Real Bottleneck: Human Review at Scale
Here’s the uncomfortable truth: AI shifted the bottleneck from code generation to code review, and human review capacity grows linearly while AI output grows exponentially.
Our senior engineers are drowning. Three of them have come to me asking to step back from senior roles because they’re spending 22-25 hours per week just reviewing code. That leaves them 10-15 hours for actual engineering work, architecture, and mentoring.
The research backs this up. Teams with high AI adoption are seeing 91% longer review times and 9% higher bug rates. AI-generated PRs wait 4.6x longer for review because reviewers have learned they contain more logic errors and fail at higher rates.
Why AI Code Takes Longer to Review
After analyzing hundreds of AI-assisted PRs, here are the patterns I’m seeing:
- Larger PRs: AI makes it easy to generate code, so developers submit 200-300 line PRs that would have been 80-100 lines before
- Copy-paste patterns: AI code shows 48% more copy-paste duplication, which reviewers have to catch
- Comprehension debt: Code that works but the team doesn’t fully understand why—this is terrifying for maintainability
- Edge case blindness: AI handles the happy path beautifully but misses edge cases that humans would catch
The result? Senior engineers can’t just scan AI code—they have to deeply audit it. And that takes time.
The Organizational Debt Nobody’s Talking About
Beyond the numbers, there’s a human cost:
- Senior engineer burnout: Four of my twelve senior engineers are showing burnout symptoms
- Review queue crisis: We have a 67% longer review queue, creating a backlog that demoralizes the team
- Junior engineer confidence: Junior developers shipping AI-heavy code have 28% lower confidence scores—they know they didn’t fully build it
- Team cohesion: Teams working on AI-heavy codebases report lower satisfaction and more friction
One of my best senior engineers told me last week: “I used to spend 60% of my time building and 40% reviewing. Now it’s flipped, and I’m basically a full-time code auditor. I didn’t sign up for this.”
So What Do We Do?
I’ve implemented a few things that are helping:
- Two-track development: 60% human-first, 40% AI-heavy. Teams opt into the track based on what they’re building
- Tiered review standards: PRs with >50% AI code get mandatory senior engineer review, architectural review for >70%
- AI Literacy Training: We train engineers on how to prompt effectively and review AI code critically
- Mandatory refactoring sprints: Every 6 weeks, we dedicate 20% of sprint capacity to refactoring AI-generated code
But I’m not sure these are long-term solutions. The fundamental problem remains: human review doesn’t scale with AI output.
The Questions I’m Wrestling With
-
Is there an inflection point where velocity gains become liability? We’re at 35% AI-generated code. Research suggests 40%+ is when risks spike significantly.
-
Can we scale code review with AI? Some teams are experimenting with AI-assisted code review, but that feels like fighting fire with fire.
-
Are we measuring the right things? Maybe “PRs merged per week” is the wrong metric when quality suffers.
-
Should we slow down? The controversial question: Is it better to ship 30% fewer features that the team fully understands than 60% more that only AI comprehends?
What I’ve Learned
After nine months, here’s my uncomfortable conclusion: We haven’t increased productivity—we’ve shifted work from code generation to code auditing. The bottleneck moved, but it didn’t go away.
AI coding assistants are incredibly powerful. But unless we solve the human review bottleneck, we’re just creating a backlog at a different stage of the pipeline.
I’m curious what others are seeing. Have you experienced the code review bottleneck? How are you scaling human review capacity? And at what point do we admit that maybe, just maybe, shipping faster isn’t the same as building better?
Sources: