We just hit a milestone that should have felt like a celebration: our team merged 98% more pull requests this quarter compared to last year.
But here’s what nobody’s talking about: PR review time increased by 91%.
The Data Nobody Wants to See
Our metrics dashboard tells a story that’s becoming increasingly common across the industry:
- PRs merged: Up 98% YoY
- Average PR size: 437 lines (up from 189 last year)
- Time to first review: 18.3 hours (up from 9.2 hours)
- Time to merge after approval: 4.1 hours (up from 1.8 hours)
- Main branch success rate: 70.8% (down from 82% two years ago)
At first, leadership celebrated the velocity gains. “AI is working!” they said. Until we started looking at the human side of the equation.
What’s Actually Happening
Our senior engineers are drowning. One of my tech leads told me last week: “I feel like I’m on an assembly line, but instead of making cars, I’m rubber-stamping code I barely understand.”
The problem is structural:
-
AI generates code faster than humans can review it thoughtfully. Junior and mid-level engineers are prompting Claude or GitHub Copilot for entire features. They get back 600 lines in minutes. They skim it, it looks reasonable, they open a PR.
-
Beyond 400 lines, you’re not getting a review—you’re getting a checkbox. Research shows reviewers can’t maintain quality past that threshold. But our average PR is now 437 lines.
-
AI-generated code has 1.7× more issues than human-written code. Studies consistently show this, but we’re not adjusting our review processes to account for it.
-
Recovery takes longer. When AI-generated code breaks production (which happens in 29.2% of merges now), debugging is harder because the original author often didn’t write it—they just approved it.
The Real Cost
Average time our senior engineers spend on PR reviews: 28.4 hours per week (up from 14.7 hours last year).
That’s 71% of their working week. They’re barely writing code anymore.
And here’s the kicker: despite all this review effort, our main branch success rate is at a 5-year low.
So… What Do We Do?
I don’t have all the answers, but here’s what we’re experimenting with:
- Smaller PRs by policy: Max 200 lines. If AI generated more, break it up.
- Required test coverage: Can’t merge without tests, period.
- AI-specific review checklist: Different standards for AI-generated vs human-written code.
- Dedicated review time blocks: Seniors get 2-hour “deep review” blocks with no meetings.
But I’m curious: Are other engineering leaders seeing this pattern?
Is human approval becoming the new bottleneck in AI-assisted development? And if so, what are you doing about it?
Because right now, I’m watching my best engineers slowly turn into full-time reviewers. And I’m not sure that’s sustainable.
Sources: