Six months ago, we rolled out Cursor and GitHub Copilot to my 40-person engineering team. The productivity metrics looked incredible at first.
Junior and mid-level engineers were shipping features faster. PR volume went up 98% per developer. Sprint velocity increased by 25%. Leadership was thrilled.
Then the seniors started burning out.
The problem nobody anticipated:
Our most experienced engineers—the ones who understand our distributed systems architecture, the ones who can spot subtle concurrency bugs, the ones who mentor everyone else—are now spending 70% of their time on code review instead of 40%.
It’s not just the volume. It’s the cognitive load.
AI-generated code has a specific signature:
- Looks correct at first glance (usually is for simple cases)
- “Almost right, but not quite” for complex scenarios
- Subtle bugs that require deep context to spot
- Security vulnerabilities that aren’t obvious (23.7% more according to research)
- Uses patterns that work individually but don’t compose well
One senior told me: “Reviewing AI code takes longer than reviewing human code because I have to check everything. Humans make consistent mistakes. AI makes random ones.”
The math doesn’t work:
Before AI:
- Junior writes 10 PRs/sprint
- Senior reviews 8-10 PRs/day across multiple juniors
- Sustainable pace
After AI:
- Junior writes 20 PRs/sprint (productivity!)
- Senior needs to review 15-20 PRs/day
- Each AI-assisted PR takes 25% longer to review thoroughly
- Seniors are overwhelmed
Result: Review queue becomes the bottleneck. Features sit waiting for review longer than they took to write.
The quality concern:
AI code shows 1.7x more issues and 23.7% more security vulnerabilities according to research. Our seniors are catching some of this in review, but they’re overloaded. Things are slipping through.
I added a metric Michelle suggested: “Issues found in production that passed code review.” Up 31% since AI adoption.
What we tried:
-
Pair programming requirements - Juniors pair with seniors when using AI for complex features. Reduces review burden but slows down junior “productivity.”
-
Unaided coding days - Every other sprint, juniors build something without AI tools. Goal: build mental models so they can self-review better.
-
Review automation - Added more automated code quality tools. Helps with syntax and simple bugs, doesn’t catch architectural issues.
-
Reduced PR size limits - Smaller changes are easier to review. But more PRs means more context switching for seniors.
The tension:
Juniors feel productive. They’re shipping more code. But they can’t debug when things break, and they don’t understand why seniors keep sending PRs back.
Seniors feel like quality gatekeepers instead of engineers. One told me: “I’m spending more time fixing other people’s AI mistakes than building things myself.”
The organizational question:
Do we slow down junior productivity to preserve senior sanity and code quality?
Or do we accept this is the new normal and invest heavily in review tooling and more senior hiring?
Short-term metrics vs long-term health:
Sprint velocity is up (yay for quarterly reviews). Senior retention risk is up (problem for next quarter). Technical debt is accumulating silently (problem for next year).
Luis’s framework of unaided learning sprints helps long-term, but tanks short-term velocity. How do I defend a 15% velocity drop to leadership when they’re focused on this quarter’s roadmap?
Michelle’s two-tier hiring approach makes sense, but I need MORE seniors to handle review load, and they’re expensive and hard to find.
Anyone else drowning in AI-generated code review? What worked for you?