PR Review Time Up 91%—How Do You Review 100+ AI-Generated PRs Per Week?

I need to talk about something that’s quietly burning out my engineering team: code review has become unsustainable in the AI era.

The Data Is Alarming

According to Faros AI’s productivity research, teams are generating 98% more pull requests while review time has increased 91%. That’s not just a number on a dashboard—that’s the lived reality for my senior engineers who are drowning in review requests.

At our EdTech startup:

  • 6 months ago: ~40 PRs per week for our team of 25 engineers
  • Today: 110+ PRs per week with the same team size
  • Average review time per PR: Up from 45 minutes to 1 hour 20 minutes
  • Senior engineer time spent reviewing: 40-50% of their week

Our most experienced engineers are spending half their time reviewing code instead of building, mentoring, or thinking strategically. And the quality of reviews is suffering because everyone’s exhausted.

Why It’s Getting Worse

AI coding assistants help developers write code faster. That’s the promise, and it’s real. But they also:

  1. Generate more code per feature. AI tends to be verbose, creating more files and more lines to review.
  2. Require deeper scrutiny. We can’t trust AI-generated code the same way we trust code from a senior engineer we’ve worked with for years. Every assumption needs validation.
  3. Make subtle mistakes. AI doesn’t make obvious typos. It makes architectural mistakes that look plausible but have hidden risks.
  4. Create review fatigue. When you’re reviewing your 15th AI-generated API endpoint of the week, your attention starts to slip.

The Process That’s Breaking

Our traditional code review process:

  • Every PR requires at least 2 approvals
  • Senior engineers review architectural changes
  • Security-sensitive code gets specialized review
  • All comments must be addressed before merge

This worked when we had 40 PRs per week. At 110+ PRs per week, it’s a bottleneck that’s slowing everything down and creating friction between teams.

Product is frustrated that features take longer despite “faster coding.” Engineering is frustrated by the overwhelming review burden. And I’m concerned about what we’re missing because reviewers are overwhelmed.

What We’ve Tried (With Mixed Results)

AI-assisted code review: We’re testing tools that use AI to pre-review code and flag potential issues. It helps, but we still need human judgment for architectural decisions and context-specific concerns. And honestly? Trusting AI to review AI-generated code feels like a hall of mirrors.

Tiered review process: Critical paths get deep review, routine changes get lighter review. The challenge is deciding what’s “critical” and ensuring junior developers understand the distinction.

Protected review time: Blocked calendar time for reviews so they’re not squeezed between meetings. Works on paper, doesn’t always work in practice when urgent PRs pile up.

Smaller PRs: We’re pushing for smaller, more focused PRs. But AI makes it so easy to generate a “complete” feature that developers resist breaking it up.

The Question That Keeps Me Up

How do you maintain quality code review at AI-accelerated PR volume without burning out your team?

Are you:

  • Using AI review tools effectively? Which ones actually work?
  • Changing your review standards or approval requirements?
  • Investing in automation to reduce what humans need to review?
  • Accepting that some things will slip through and investing in observability instead?
  • Finding ways to make reviewing more sustainable for senior engineers?

Because right now, we’re heading toward a crisis where our best engineers either spend all their time reviewing or we compromise on quality. Neither option is acceptable.

I’d love to hear what’s actually working for other teams facing this challenge.

Keisha, this is hitting from a product perspective too. Slow code review is now the #1 blocker for feature delivery—not design, not product decisions, but engineering review capacity.

We’re in this frustrating situation where:

  • Features are “code complete” days before they actually ship
  • Product managers are asking “why is this taking so long?” when devs say “waiting for review”
  • Customers are asking about features that are literally sitting in PR queues

The tension between speed and quality is real. But here’s what I’m learning: which PRs actually need deep human review vs which ones can be validated differently?

Some changes genuinely need experienced eyes—architectural decisions, security-sensitive code, complex business logic. But do we really need 2 senior approvals for a copy change, a CSS tweak, or a config update?

What if we:

  • Auto-approve low-risk changes that pass all automated checks (with clear criteria for “low-risk”)
  • Single approval for medium-risk changes (most feature work)
  • Deep review for high-risk changes (security, payments, data access, architectural)

The challenge is getting alignment on what constitutes each risk tier. But it feels like we’re applying the same rigorous process to everything when we should be focusing review attention where it matters most.

Also: product leaders need to understand this bottleneck. If review capacity is 40 PRs per week and we’re generating 110, something has to give. Either we invest in review capacity (more seniors, better tooling) or we acknowledge that delivery will slow down despite faster coding.

David’s tiered review approach is exactly what we implemented, and it’s working. Here’s our specific framework:

Risk-Based Review Tiers

Tier 1 - Auto-merge (15% of PRs):

  • Documentation updates
  • Configuration changes (within guardrails)
  • Test-only changes
  • CSS/styling tweaks
  • Requirements: All automated checks pass, no security flags, <50 lines changed

Tier 2 - Single approval (70% of PRs):

  • Feature implementation following established patterns
  • Bug fixes
  • Refactoring within existing architecture
  • Requirements: Automated checks + 1 senior approval

Tier 3 - Deep review (15% of PRs):

  • New API endpoints or database schemas
  • Security-sensitive code (auth, payments, PII)
  • Architectural changes
  • Third-party integrations
  • Requirements: 2 senior approvals + security review for sensitive areas

Implementation Details

Automated categorization: PR template asks author to select tier + justify. GitHub Actions enforces requirements based on tier.

Smart reviewer assignment: Tool automatically assigns reviewers based on expertise, current load, and tier urgency. Distributes review work more evenly.

Review SLAs by tier:

  • Tier 1: Auto (if checks pass)
  • Tier 2: 4 hours for first review
  • Tier 3: 24 hours (scheduled, not squeezed between tasks)

AI-assisted pre-review for Tier 2: We use CodeRabbit to flag common issues (unused vars, potential bugs, style violations) before human review. Reduces review time ~30%.

Results After 3 Months

  • Senior engineer review time down from 45% to 28% of week
  • Time to merge (median) down 40%
  • Post-merge incident rate unchanged (quality maintained)
  • Developer satisfaction with review process up significantly

The key was getting team buy-in that not all PRs are equal. Some changes genuinely need deep thought. Others just need automated validation.

Your “hall of mirrors” comment about AI reviewing AI code resonated—we use AI for syntax and pattern checking, but human review remains essential for business logic and architectural decisions.

Luis’s framework is excellent. We’ve taken a similar but slightly different approach that adds one more layer: strategic investment in code review automation as platform infrastructure.

What We Built

Created a Code Review Platform team (2 engineers) whose entire job is making review faster and better. They’ve built:

1. AI-Powered Review Assistant (with guardrails):

  • Pre-reviews every PR for common issues: security antipatterns, performance problems, accessibility violations
  • Leaves comments as “suggestions” not blocks—humans make final call
  • Learns from accepted vs rejected suggestions to reduce noise
  • Catches ~60% of issues that used to require human time

2. Context-Aware Reviewer Routing:

  • Analyzes PR content and automatically assigns best reviewer based on:
    • Code ownership (who touches this code most)
    • Domain expertise (who knows this system)
    • Current review load (don’t overload same people)
    • Historical review quality (match complexity to experience)

3. Review Efficiency Dashboard:

  • Shows each reviewer’s queue, average review time, approval patterns
  • Identifies bottlenecks: “Sarah has 15 PRs waiting, Luis has 2”
  • Gamifies positive behaviors (thorough but fast reviews get recognition)
  • Flags concerning patterns (rubber stamping, overly nitpicky reviews)

4. Automated Merge for Trusted Patterns:

  • Machine learning model trained on 10,000+ reviewed PRs
  • Auto-approves PRs that match previously-approved patterns
  • Still runs all automated checks—automation is about human time, not safety
  • Catches 20% of PRs that would have been routine approvals

Cultural Shifts We Had to Make

Code review is enablement, not gatekeeping. Shifted language from “approving” to “supporting.” Reviewers help ship good code fast, they don’t block bad code slowly.

Trust but verify. Junior developers can ship routine changes with light review. But we track and learn—if a developer’s code frequently has post-merge issues, they graduate back to deeper review.

Async-first review. Protected focus time for review (no meetings during peak review hours). Reviews don’t need to be instant, but they need to be timely.

Celebrate great reviews, not just great code. Recognition for reviewers who provide thoughtful, educational feedback. Review quality matters as much as code quality.

Results

  • Review time per PR down 35% despite 2x PR volume
  • Senior engineers went from 45% review time to 30%
  • Quality metrics unchanged (defect rate, incident count)
  • Most importantly: team morale improved—review felt sustainable again

The investment was significant (2 FTEs, tooling budget), but the ROI was undeniable. Review was becoming the constraint that negated all AI productivity gains. Now it’s flowing again.

This whole thread is making me think about the design review parallel in a new way :thinking:

We’re facing a similar explosion: AI tools help me generate design variations quickly, so product teams are requesting more design reviews. “Can you try 3 different approaches?” used to mean days of work. Now it means hours. So they ask for it constantly.

But here’s what I’m realizing from your engineering frameworks: I’ve been treating all design reviews the same when they’re not.

Some design work needs deep human judgment:

  • Information architecture decisions
  • Accessibility patterns for complex interactions
  • Brand expression and visual coherence
  • UX flows for novel features

Other design work needs validation but not deep critique:

  • Component usage following design system
  • Copy tweaks
  • Color/spacing adjustments within guidelines
  • Icon selections from approved library

I think I need Luis’s tiered approach for design:

Tier 1 - Design system compliance: Automated checks (are you using approved components, tokens, patterns?)

Tier 2 - Quick review: Visual QA to ensure it looks right and follows guidelines

Tier 3 - Deep critique: Strategic design decisions that require design thinking, user research, accessibility expertise

Right now I’m giving Tier 3 attention to Tier 1 changes, and it’s exhausting.

The challenge: Designers are taught to care about craft and details. It feels wrong to “rubber stamp” anything. But maybe what I’m learning from engineering is that spending equal attention on everything means critical things don’t get the attention they deserve.

Also love Michelle’s cultural point about “enablement not gatekeeping.” Design review should help teams ship great experiences fast, not slow them down with perfectionism on things that don’t matter to users.