Last month, I pulled our team’s DORA metrics and something jumped out that I can’t stop thinking about. My team of 40+ engineers at a Fortune 500 financial services company has been using AI coding assistants heavily since Q3 2025. The results? We’re merging 98% more pull requests than this time last year. Sounds amazing, right?
Here’s the catch: our PR review time has increased by 91%.
The Numbers That Don’t Add Up
According to the latest research, developers are saving about 3.6 hours per week using AI coding tools. That matches what I’m seeing. Our engineers are flying through implementation tasks. But here’s what the productivity dashboards miss: those saved hours aren’t translating to faster delivery.
When I dug deeper, I found that our senior engineers—the ones doing most of the code reviews—are now spending 60-70% of their time reviewing code instead of the 30-40% from a year ago. The bottleneck didn’t disappear. It migrated.
The Trust Problem
This isn’t just about volume. A recent survey found that 96% of developers don’t fully trust the functional accuracy of AI-generated code. I see this playing out in our reviews. When someone knows a junior dev wrote the code with heavy AI assistance, the review is more thorough, more cautious, and frankly, more time-consuming.
And they should be cautious. The data shows AI-generated code introduces 15-18% more security vulnerabilities. In financial services, we can’t ship code just because it compiles and passes basic tests.
Amdahl’s Law Comes Home to Roost
For those not familiar, Amdahl’s Law basically says that a system can only move as fast as its slowest component. We’ve dramatically accelerated code generation, but our review process—and our testing, deployment, and quality assurance processes—are still designed for human-speed development.
It’s like we gave everyone on the team a Ferrari but kept the same brake system from our old sedans. Sure, we can accelerate faster, but we’re still stopping at the same rate.
What We’re Trying
I don’t have all the answers, but here’s what we’re experimenting with:
- Dedicated Review Capacity: Instead of assuming everyone can just “review more,” we’re rotating engineers through dedicated review weeks
- Pre-Review Quality Gates: Automated security scans, linting, and test coverage requirements before human review
- AI-Specific Review Training: Teaching reviewers what to look for in AI-generated code (patterns, edge cases, security issues)
- Smaller PRs: Pushing back on the “AI wrote 1000 lines” mega-PRs
Early results are mixed. We’ve reduced review time by about 15%, but we’re nowhere near the 98% increase in PR volume.
Questions for the Community
I’m curious what others are seeing:
- Are you experiencing similar review bottlenecks with AI adoption?
- What metrics are you tracking beyond “time to write code”?
- How are you handling the trust issue with AI-generated code?
- Have you found effective ways to scale your review capacity?
We’re all learning this together. The tools are evolving faster than our processes, and I suspect a lot of teams are running into this without realizing it yet.
What’s your experience been?
Sources: