I need to talk about the crisis nobody’s discussing openly: AI coding tools are burning out our senior engineers.
The Review Avalanche
Six months ago, we adopted GitHub Copilot across our EdTech startup (80 engineers, growing fast). The productivity gains were immediate—our junior engineers especially became significantly more prolific. But we created a new problem we didn’t anticipate.
Our PR metrics before/after AI adoption:
| Metric | Before AI | After AI | Change |
|---|---|---|---|
| PRs per week | 45 | 127 | +182% |
| Average review time | 4 hours | 9 hours | +125% |
| Senior engineer review load | 6 PRs/week | 18 PRs/week | +200% |
| Senior engineer time spent reviewing | 30% | 60% | +100% |
Let me be blunt: my senior engineers are spending 60% of their time reviewing code instead of architecting systems, mentoring, or solving hard technical problems.
The Human Cost (This Is What Keeps Me Up at Night)
Three weeks ago, one of my best senior engineers—let’s call her Sarah—came to my office and said: “Keisha, I’m drowning. I barely write code anymore. I’m just a code review machine.”
Sarah’s not alone. In our last engagement survey:
- 73% of senior engineers report review burden as #1 frustration
- 2 seniors have started looking externally (I found out through backchannel)
- Junior engineers are frustrated too—waiting 48+ hours for review feedback
This isn’t sustainable. We’re at risk of losing our most experienced people because AI made everyone else faster.
The Irony: AI Helps Writing But Not Reviewing (Or Does It?)
Here’s the paradox: AI tools are excellent at helping engineers write code. GitHub Copilot suggests completions, writes boilerplate, even generates tests.
But reviewing AI-generated code is arguably HARDER than reviewing human-written code:
- AI can generate syntactically correct code that’s subtly wrong
- Pattern matching is harder—AI doesn’t always follow team conventions
- Reviewing 200-line PRs takes longer than reviewing 50-line PRs, even if quality is similar
- Junior engineers using AI may not understand the code well enough to explain it
One of my seniors put it perfectly: “I don’t just review the code—I review whether the engineer understands what they wrote.”
The Solutions We’re Implementing
After three months of experimentation, here’s our multi-pronged approach:
1. AI-Assisted Code Review (Fight Fire with Fire)
We’re piloting AI review tools to pre-screen PRs:
- CodeRabbit: Automated review comments on patterns, potential bugs, style issues
- GitHub Copilot Workspace: Helps reviewers understand code context faster
- Custom Linters: Enhanced to catch AI-common patterns we’ve identified
Early results: AI review tools catch 40% of the issues seniors would have flagged, freeing them to focus on architecture and logic.
2. Tiered Review Process (Not All PRs Are Equal)
This was culturally hard but necessary. We created explicit review tiers:
Tier 0 - Auto-merge:
- Criteria: Tests pass + < 50 lines + documentation/config only + automated security scan clean
- Reviewer: Automated tooling only
- Time to merge: < 10 minutes
- Volume: ~20% of PRs
Tier 1 - Peer Review:
- Criteria: Feature work within established patterns + < 200 lines
- Reviewer: Another engineer in same domain (can be mid-level)
- Time to merge: < 4 hours
- Volume: ~50% of PRs
Tier 2 - Senior Review:
- Criteria: New patterns + performance implications + security-sensitive code
- Reviewer: Senior engineer or tech lead
- Time to merge: < 12 hours
- Volume: ~25% of PRs
Tier 3 - Architecture Review:
- Criteria: Cross-service changes + data model changes + API contracts
- Reviewer: Staff engineer + relevant domain lead
- Time to merge: 1-2 days
- Volume: ~5% of PRs
The key cultural shift: Appropriate review for risk level, not one-size-fits-all.
3. “Review Office Hours” (Batching Over Interrupts)
Senior engineers were being interrupt-driven all day—every new PR was a context switch.
We implemented structured review time:
- Morning Review Block: 9-11am, dedicated review time
- Afternoon Review Block: 2-3:30pm, dedicated review time
- Outside these windows: Only urgent/blocking reviews
This reduced context switching and gave seniors protected time for deep work.
4. Review Capacity Planning (Treat It Like Any Other Resource)
We now forecast review capacity in sprint planning:
- Estimate review hours needed based on planned work complexity
- Allocate senior review time as a constrained resource
- If review capacity is fully allocated, delay low-priority work
Sounds obvious, but we weren’t doing this before. PRs were “infinite demand” on senior time.
The Results (3 Months In)
Quantitative:
- Average PR review time: Down from 9 hours to 5 hours
- Senior engineer review load: Down from 18 to 11 PRs/week (still high, but manageable)
- Time to merge (P50): 6 hours (down from 24 hours)
Qualitative:
- Senior engineer satisfaction: Significantly improved
- Junior engineers feel more trusted (peer review empowers them)
- “Review quality” hasn’t decreased (measured by bug escape rate)
Sarah’s Update:
She’s still here, and in our 1:1 last week she said: “I feel like an architect again, not a reviewer.”
The Challenges We’re Still Facing
Being honest about what’s not working:
-
Perception of “junior distrust”: Some junior engineers feel Tier 0/1 reviews mean they’re “not trusted.” We’re working on communication—it’s about efficiency, not trust.
-
Gaming the system: Engineers trying to keep PRs under size limits to hit Tier 0/1, even when it means splitting work artificially. We’re learning to detect this.
-
Edge cases: Some PRs don’t fit neatly into tiers. We need human judgment, which requires review lead training.
-
Tool fatigue: Adding AI review tools means another tool to learn, another notification stream. We’re being selective.
The Bigger Pattern: AI Exposes Organizational Bottlenecks
This is part of a bigger theme I’m seeing: AI tools don’t just make individuals faster—they stress-test your entire organizational design.
Code review was always a bottleneck; we just didn’t notice because it was manageable. AI turned “manageable constraint” into “crisis.”
The same pattern plays out everywhere:
- Testing infrastructure (Luis wrote about this recently—excellent thread!)
- Deployment pipelines
- QA capacity
- Product requirement clarity
If your organizational processes were designed for 50 PRs/week, they’ll break at 150 PRs/week—no matter how good the code is.
Questions for the Community
-
How are other scaling teams handling review capacity? Especially if you’re 100+ engineers or growing quickly.
-
AI review tools: What’s working? We’re seeing value from CodeRabbit, but curious about other experiences.
-
Cultural resistance: How did you overcome “every PR needs thorough senior review” mindset?
-
Metrics: What do you track to measure review effectiveness? We’re tracking time and volume, but what about quality?
The empathetic leadership approach I learned at Google and Slack taught me: we can’t just tell people to “review faster.” We need systemic changes that respect everyone’s time and cognitive load.
AI is a gift, but only if we redesign our processes to handle the volume it creates. Otherwise, it’s just a fancy way to burn people out.
What’s your code review bottleneck story? How are you adapting? ![]()