I need help. Our team is drowning, and I’m not sure how to fix it.
Three months ago, we celebrated: developers using AI tools, PRs flowing, velocity charts trending up. Leadership was thrilled.
Today, our senior engineers are burned out, our review queue is backed up 8 days, and we just shipped a security bug that never should’ve made it to production.
The Numbers That Tell the Story
Here’s our Q1 data compared to Q4 2025 (before widespread AI adoption):
What Went Up:
- Pull requests created: +98% (from 42/week to 83/week)
- Individual tickets completed: +21%
- Lines of code written: +115%
What Also Went Up (The Bad Kind):
- PR review time: +91% (average time from PR open to merge: 2.1 days → 4.0 days)
- Review comments per PR: +67% (reviewers finding more issues)
- Post-deployment defects: +28%
- Senior engineer burnout indicators: +40%
What Stayed Flat:
- Deployment frequency (we’re not shipping faster)
- Features delivered to customers (volume up, but many blocked in review)
The Problem: Human Review Can’t Scale With AI Output
Here’s what’s happening:
Junior and mid-level developers are using AI to write code 2-3x faster. Great!
But that code needs review. And reviewers are human. With human constraints.
The math doesn’t work:
- 5 junior devs using AI → 15 PRs/week (up from 6 PRs/week)
- 2 senior engineers reviewing → same 40 hours/week capacity
- Result: Review becomes the bottleneck
And it’s not just volume—it’s complexity.
Reviewing AI-generated code is different from reviewing human code. You have to verify:
- Does it actually solve the problem?
- Does it fit our architecture?
- Does it handle edge cases?
- Is it maintainable?
- Did the author understand what they committed?
With human-written code, you can assume the developer understands their own code. With AI-assisted code, that assumption breaks down.
The Security Bug That Shouldn’t Have Happened
Two weeks ago, we shipped an API endpoint with a critical security flaw. Here’s what happened:
- Junior dev used Copilot to build authentication middleware
- AI generated syntactically correct code that passed tests
- Code passed automated security scanning (basic checks)
- PR sat in review queue for 6 days due to backlog
- Senior engineer, overwhelmed by review volume, did a quick review
- Missed that the implementation bypassed our standard auth flow
- Shipped to production
- Customer security audit caught it
We were lucky it was caught in audit, not exploited.
Root cause: Not the AI. Not the junior dev. The process that couldn’t handle the increased volume of code requiring careful review.
What We’ve Tried (Mixed Results)
Attempt 1: “Review Faster”
- Asked senior engineers to spend more time on reviews
- Result: Burnout increased, quality didn’t improve
Attempt 2: Automated Review Tools
- Invested in better static analysis, security scanning
- Result: Caught some issues, but can’t verify architectural fit or business logic correctness
Attempt 3: Smaller PRs
- Encouraged devs to break work into smaller chunks
- Result: More PRs, same total review time, now even more context switching
Attempt 4: Pair Programming with AI
- Required a human partner when using AI for complex features
- Result: Slowed down individual development, but fewer review issues
- This one shows promise
Attempt 5: “Review Rotation”
- Distributed review responsibility across more engineers
- Result: Context fragmentation, inconsistent standards
What I Think Might Work (Seeking Feedback)
1. Tiered Review Process
- Low-risk code (tests, docs, internal tools): Light review, automated checks
- Medium-risk (non-critical features): Standard review process
- High-risk (auth, payments, core business logic): Deep review, pair programming, no AI or very supervised AI
2. “AI-Generated” Tagging
- Require devs to mark which parts used AI assistance
- Reviewers know to scrutinize those sections more carefully
- Over time, learn which use cases work well vs. poorly
3. Review Capacity as a Team Capacity
- Stop measuring individual developer velocity
- Measure team throughput (idea → production)
- Make review capacity visible and limit WIP based on it
4. Senior Engineers as “Architectural Reviewers”
- Don’t review every line of code
- Instead, review architectural decisions, critical paths, security boundaries
- Trust automated tools + mid-level reviewers for implementation details
5. “AI-Assisted Review”
- Use AI to help reviewers, not just developers
- Automated summaries, change impact analysis, test coverage gaps
- Human makes final judgment
The Questions I’m Wrestling With
-
How are other teams handling high-volume PR backlogs? What actually works?
-
Should we limit AI usage until our review process can handle the output? Or improve the process first?
-
How do you review AI-generated code differently from human code?
-
What’s the right balance between speed (individual devs writing code) and safety (thorough review)?
-
Is this a process problem or a tooling problem? Can technology solve what technology created?
I don’t have answers. But I know our current approach—“just review more, faster”—isn’t working.
And I’m worried other engineering leaders are facing this same crisis but not talking about it publicly.
We need to solve the review bottleneck, or the AI productivity gains are meaningless.