98% More PRs Merged, But Review Time Up 91%: Is Human Approval the New AI Bottleneck?

We just hit a milestone that should have felt like a celebration: our team merged 98% more pull requests this quarter compared to last year.

But here’s what nobody’s talking about: PR review time increased by 91%.

The Data Nobody Wants to See

Our metrics dashboard tells a story that’s becoming increasingly common across the industry:

  • PRs merged: Up 98% YoY
  • Average PR size: 437 lines (up from 189 last year)
  • Time to first review: 18.3 hours (up from 9.2 hours)
  • Time to merge after approval: 4.1 hours (up from 1.8 hours)
  • Main branch success rate: 70.8% (down from 82% two years ago)

At first, leadership celebrated the velocity gains. “AI is working!” they said. Until we started looking at the human side of the equation.

What’s Actually Happening

Our senior engineers are drowning. One of my tech leads told me last week: “I feel like I’m on an assembly line, but instead of making cars, I’m rubber-stamping code I barely understand.”

The problem is structural:

  1. AI generates code faster than humans can review it thoughtfully. Junior and mid-level engineers are prompting Claude or GitHub Copilot for entire features. They get back 600 lines in minutes. They skim it, it looks reasonable, they open a PR.

  2. Beyond 400 lines, you’re not getting a review—you’re getting a checkbox. Research shows reviewers can’t maintain quality past that threshold. But our average PR is now 437 lines.

  3. AI-generated code has 1.7× more issues than human-written code. Studies consistently show this, but we’re not adjusting our review processes to account for it.

  4. Recovery takes longer. When AI-generated code breaks production (which happens in 29.2% of merges now), debugging is harder because the original author often didn’t write it—they just approved it.

The Real Cost

Average time our senior engineers spend on PR reviews: 28.4 hours per week (up from 14.7 hours last year).

That’s 71% of their working week. They’re barely writing code anymore.

And here’s the kicker: despite all this review effort, our main branch success rate is at a 5-year low.

So… What Do We Do?

I don’t have all the answers, but here’s what we’re experimenting with:

  1. Smaller PRs by policy: Max 200 lines. If AI generated more, break it up.
  2. Required test coverage: Can’t merge without tests, period.
  3. AI-specific review checklist: Different standards for AI-generated vs human-written code.
  4. Dedicated review time blocks: Seniors get 2-hour “deep review” blocks with no meetings.

But I’m curious: Are other engineering leaders seeing this pattern?

Is human approval becoming the new bottleneck in AI-assisted development? And if so, what are you doing about it?

Because right now, I’m watching my best engineers slowly turn into full-time reviewers. And I’m not sure that’s sustainable.


Sources:

This hits hard. We’re seeing the exact same pattern, and honestly, it’s making me question whether we’re solving the right problem with AI tools.

Your stat about 71% of senior engineer time going to reviews? That tracks with what I’m hearing from our engineering team. Our senior dev literally said last week: “I’ve become a quality gatekeeper, not a builder.”

The Quality Debt We’re Not Measuring

What worries me most is the comprehension debt we’re accumulating. When I review design system components now, I see code that works but nobody can explain why it works that way.

Three months ago, one of our engineers asked Claude to generate a new form validation component. 450 lines of TypeScript. Looked great in the PR. Merged.

Last week, we needed to modify it to handle a new edge case. Nobody on the team understood it well enough to touch it. We ended up asking Claude to modify it again. Now we have a component that’s been through two AI iterations, and zero humans who can confidently maintain it.

That’s not sustainable.

Your 200-Line Rule is Interesting

I love your max 200 lines policy. We’re actually doing something similar for design system PRs, but we go further:

“AI-generated code must include a design doc.”

Not just tests—a human-written explanation of:

  • What problem this solves
  • Why this approach was chosen (over alternatives)
  • What edge cases it handles
  • How it integrates with existing systems

Forces the engineer to understand what the AI generated before submitting it. It’s slowing us down, but the alternative is architectural decisions being made by autocomplete.

The Review Process Needs to Evolve

Your AI-specific review checklist is smart. We’ve added these checks for AI-generated PRs:

  • :white_check_mark: Does the author understand this code well enough to debug it at 2am?
  • :white_check_mark: Are there abstractions we’ll regret in 6 months?
  • :white_check_mark: Is this solving the right problem, or just a problem?

That last one is key. AI is really good at solving problems you didn’t ask it to solve.


Curious: are you seeing the same “comprehension debt” issue? Or is your team better at understanding AI-generated code than ours?

Luis, this data is invaluable. I’m sharing it with my leadership team today because it quantifies something we’ve all been feeling but couldn’t articulate.

The Org Design Problem

What strikes me about your situation is that you’re treating this as a process problem. But I think it’s actually an org design problem.

Your senior engineers spending 71% of their time on reviews isn’t just inefficient—it’s a signal that your team structure doesn’t match your new workflow reality.

Here’s what I’m seeing across the industry:

The Senior-Only Model is Breaking

AI tools are collapsing the junior → mid → senior progression. We’re hiring seniors to review AI-generated code from mid-levels. But seniors aren’t growing because they’re not building anymore.

At my company, we just ran our H1 performance reviews. Three of our top engineers expressed frustration: “I feel like I’ve stopped learning.” They’re reviewing code, not architecting systems.

That’s a retention risk that most leadership teams aren’t measuring.

What We’re Trying Instead

We’re experimenting with a new role: “AI Workflow Architect.”

Not a manager. Not a senior engineer in the traditional sense. Someone whose primary job is:

  1. Design prompting patterns for common feature types
  2. Create review playbooks specific to AI-generated code
  3. Build automation around the review process itself
  4. Train the team on effective AI collaboration

We promoted one of our Staff Engineers into this role 6 weeks ago. Early results:

  • PR review time down 34%
  • Main branch success rate up from 71% to 78%
  • Senior engineers reporting more time for architecture work

The Measurement Gap

Your point about main branch success rate hitting a 5-year low despite increased review time? That’s the key insight.

We’re measuring the wrong things. We’re optimizing for:

  • Lines of code written
  • PRs merged
  • Cycle time

But we’re not measuring:

  • Comprehension debt (h/t @maya_builds’s point above)
  • Debugging time for AI-generated code
  • Senior engineer satisfaction and retention risk
  • Knowledge transfer effectiveness

The Real Question

Are we scaling engineering or just scaling code generation?

Because if it’s the latter, we’re going to wake up in 18 months with a codebase nobody understands and a team of burned-out seniors who’ve forgotten how to build.

Your experiments are smart, Luis. But I’d encourage you to think bigger about team structure, not just process improvements.

What if the bottleneck isn’t the review process—it’s that we’re trying to fit AI-assisted development into an org chart designed for humans writing code line by line?