Our Team Generates 98% More PRs with AI, But Review Time Increased 91%—We're Drowning

I need help. Our team is drowning, and I’m not sure how to fix it.

Three months ago, we celebrated: developers using AI tools, PRs flowing, velocity charts trending up. Leadership was thrilled.

Today, our senior engineers are burned out, our review queue is backed up 8 days, and we just shipped a security bug that never should’ve made it to production.

The Numbers That Tell the Story

Here’s our Q1 data compared to Q4 2025 (before widespread AI adoption):

What Went Up:

  • Pull requests created: +98% (from 42/week to 83/week)
  • Individual tickets completed: +21%
  • Lines of code written: +115%

What Also Went Up (The Bad Kind):

  • PR review time: +91% (average time from PR open to merge: 2.1 days → 4.0 days)
  • Review comments per PR: +67% (reviewers finding more issues)
  • Post-deployment defects: +28%
  • Senior engineer burnout indicators: +40%

What Stayed Flat:

  • Deployment frequency (we’re not shipping faster)
  • Features delivered to customers (volume up, but many blocked in review)

The Problem: Human Review Can’t Scale With AI Output

Here’s what’s happening:

Junior and mid-level developers are using AI to write code 2-3x faster. Great!

But that code needs review. And reviewers are human. With human constraints.

The math doesn’t work:

  • 5 junior devs using AI → 15 PRs/week (up from 6 PRs/week)
  • 2 senior engineers reviewing → same 40 hours/week capacity
  • Result: Review becomes the bottleneck

And it’s not just volume—it’s complexity.

Reviewing AI-generated code is different from reviewing human code. You have to verify:

  • Does it actually solve the problem?
  • Does it fit our architecture?
  • Does it handle edge cases?
  • Is it maintainable?
  • Did the author understand what they committed?

With human-written code, you can assume the developer understands their own code. With AI-assisted code, that assumption breaks down.

The Security Bug That Shouldn’t Have Happened

Two weeks ago, we shipped an API endpoint with a critical security flaw. Here’s what happened:

  1. Junior dev used Copilot to build authentication middleware
  2. AI generated syntactically correct code that passed tests
  3. Code passed automated security scanning (basic checks)
  4. PR sat in review queue for 6 days due to backlog
  5. Senior engineer, overwhelmed by review volume, did a quick review
  6. Missed that the implementation bypassed our standard auth flow
  7. Shipped to production
  8. Customer security audit caught it

We were lucky it was caught in audit, not exploited.

Root cause: Not the AI. Not the junior dev. The process that couldn’t handle the increased volume of code requiring careful review.

What We’ve Tried (Mixed Results)

Attempt 1: “Review Faster”

  • Asked senior engineers to spend more time on reviews
  • Result: Burnout increased, quality didn’t improve

Attempt 2: Automated Review Tools

  • Invested in better static analysis, security scanning
  • Result: Caught some issues, but can’t verify architectural fit or business logic correctness

Attempt 3: Smaller PRs

  • Encouraged devs to break work into smaller chunks
  • Result: More PRs, same total review time, now even more context switching

Attempt 4: Pair Programming with AI

  • Required a human partner when using AI for complex features
  • Result: Slowed down individual development, but fewer review issues
  • This one shows promise

Attempt 5: “Review Rotation”

  • Distributed review responsibility across more engineers
  • Result: Context fragmentation, inconsistent standards

What I Think Might Work (Seeking Feedback)

1. Tiered Review Process

  • Low-risk code (tests, docs, internal tools): Light review, automated checks
  • Medium-risk (non-critical features): Standard review process
  • High-risk (auth, payments, core business logic): Deep review, pair programming, no AI or very supervised AI

2. “AI-Generated” Tagging

  • Require devs to mark which parts used AI assistance
  • Reviewers know to scrutinize those sections more carefully
  • Over time, learn which use cases work well vs. poorly

3. Review Capacity as a Team Capacity

  • Stop measuring individual developer velocity
  • Measure team throughput (idea → production)
  • Make review capacity visible and limit WIP based on it

4. Senior Engineers as “Architectural Reviewers”

  • Don’t review every line of code
  • Instead, review architectural decisions, critical paths, security boundaries
  • Trust automated tools + mid-level reviewers for implementation details

5. “AI-Assisted Review”

  • Use AI to help reviewers, not just developers
  • Automated summaries, change impact analysis, test coverage gaps
  • Human makes final judgment

The Questions I’m Wrestling With

  1. How are other teams handling high-volume PR backlogs? What actually works?

  2. Should we limit AI usage until our review process can handle the output? Or improve the process first?

  3. How do you review AI-generated code differently from human code?

  4. What’s the right balance between speed (individual devs writing code) and safety (thorough review)?

  5. Is this a process problem or a tooling problem? Can technology solve what technology created?

I don’t have answers. But I know our current approach—“just review more, faster”—isn’t working.

And I’m worried other engineering leaders are facing this same crisis but not talking about it publicly.

We need to solve the review bottleneck, or the AI productivity gains are meaningless.

Luis, your security bug story gave me chills because we had nearly the exact same incident.

This Is an Enterprise-Scale Problem

At 50 engineers, we’re seeing the same bottleneck. I can only imagine it gets worse as you scale.

Your tiered review process idea is exactly right. Here’s how we’re implementing it:

Risk-Based Review Framework:

Critical Path (20% of code, 60% of review time):

  • Authentication, authorization, payment processing
  • Data access layer, security boundaries
  • NO AI or heavily supervised AI only
  • Required: 2 senior engineers + security review
  • Architecture discussion before implementation

Standard Features (60% of code, 30% of review time):

  • Business logic, UI components, API endpoints
  • AI allowed with oversight
  • 1 senior or 2 mid-level reviewers
  • Automated security + architectural checks

Low Risk (20% of code, 10% of review time):

  • Tests, documentation, internal tooling, scripts
  • AI encouraged
  • Automated checks + junior reviewer OK
  • Fast-track merge

The key: Explicitly allocate review capacity based on risk, not “first come first served.”

Your #3—making review capacity visible—is critical. We track it on our sprint board:

  • Total review hours available this sprint: 80
  • Review hours committed: 65
  • Capacity remaining: 15 hours

When we hit 80 hours committed, we stop accepting new PRs into the sprint. Full stop.

This forced uncomfortable conversations about priorities. But it prevented exactly the overwhelmed-reviewer burnout you’re describing.

AI-Assisted Review: Our Experiment

We’re piloting using Claude Code to help reviewers:

  • Summarize PR changes
  • Identify test coverage gaps
  • Flag potential security issues
  • Check adherence to team conventions

Early results: Reviewers spend less time on “did you test this?” and more time on “is this the right approach?”

But critical: Human still makes final call. AI surfaces issues; human judges importance.

Luis, this is why I keep talking about psychological safety. The review bottleneck isn’t just technical—it’s organizational.

The Hidden Cost: Senior Engineer Burnout

Your “+40% burnout indicators” stat buried in the data? That’s the real crisis.

Senior engineers are:

  • Spending more time reviewing, less time building
  • Feeling like “quality police” instead of architects
  • Watching junior devs move fast with AI while they’re stuck in review hell
  • Getting blamed when bad code ships (“why didn’t you catch it?”)

This is unsustainable. And it’s creating resentment.

What we changed: Dedicated review capacity as an explicit role rotation

  • 20% of senior engineer time is “review rotation” each sprint
  • When you’re on rotation, review IS your primary job
  • When you’re not on rotation, you’re explicitly protected from review interrupts
  • This prevents the “everyone reviews everything all the time” burnout

Michelle’s capacity tracking is brilliant. We do similar:

Review Budget Visualization:

  • Review hours available: Shown on sprint board
  • PR queue status: Green (<3 days avg) / Yellow (3-5 days) / Red (>5 days)
  • When we hit Red, we STOP creating new work and clear the queue

Controversial but necessary: Sometimes saying “no, we can’t take that on this sprint” is better than saying yes and shipping with inadequate review.

The question: Are we optimizing for individual developer speed or team quality?

Reading this as a non-engineer, and it’s making me think about design reviews…

Is This Volume vs. Value?

Luis, you’re generating 98% more PRs but not shipping faster.

Genuine question: Is the problem that we’re creating too much, not that we’re reviewing too slowly?

In design, I learned: More mockups doesn’t mean better design. Sometimes it means unfocused exploration.

What if AI is helping developers generate code volume that isn’t actually needed?

Like, maybe some of those PRs shouldn’t exist? Maybe AI makes it so easy to “just build this idea” that we skip the step of asking “should we build this?”

Michelle’s risk-based review framework makes sense. But I wonder if there’s a step before that:

Value-based feature gate: Should this code exist at all?

Not criticizing—genuinely curious if the review bottleneck is pointing to an upstream prioritization problem.

From design perspective: When review becomes the bottleneck, sometimes it means we need better upfront planning, not faster review.

Does that apply to engineering too? :thinking:

Maya’s onto something. This isn’t just a review problem—it’s a decision-making problem.

The Review Bottleneck Reveals Deeper Issues

Luis, respectfully: 98% more PRs is a symptom, not a feature.

Before AI: Engineering velocity was constrained by implementation speed. That forced product to be thoughtful about priorities.

After AI: Implementation is fast, so we say “yes” to more things. Then everything backs up in review.

The review queue is telling you: Your team is building more than you can thoughtfully validate and ship.

This mirrors what I see in product:

  • Faster implementation → more features attempted
  • Insufficient discovery → unclear if features solve real problems
  • Everything arrives at QA/review simultaneously → bottleneck

The review bottleneck is your canary in the coal mine. It’s showing you that AI hasn’t actually made your team faster—it’s just moved the constraint.

Before: Implementation was the constraint
Now: Thoughtful review and validation is the constraint

Michelle’s capacity tracking is smart. But I’d add: Track not just review capacity, but decision-making capacity.

Questions before writing code:

  • Have we validated this solves a customer problem?
  • Is this in our top 5 priorities this quarter?
  • What’s the opportunity cost?

If AI makes it trivial to build anything, then deciding what to build becomes the most important skill.

And right now, it sounds like that decision-making is broken.