AI Speeds Up Coding 30% But Review Time Exploded 91%—Senior Engineers Are Now the Bottleneck

So our team adopted AI coding assistants about 6 months ago :robot: Everyone was pumped—management promised we’d ship 30% faster, engineers were excited to spend less time on boilerplate. And you know what? Individually, it feels true. I can spin up a component in half the time, AI handles the tedious parts, and I get to focus on the creative stuff.

But here’s what nobody warned us about: our code review time exploded by 91%.

Not “got a little slower.” Not “needs some optimization.” Ninety-one percent.

I first noticed it when our most senior engineer—let’s call her Sarah—started declining meetings. When I asked why, she said: “I’m drowning in PR reviews. Can’t make progress on my own work.” Turns out she wasn’t exaggerating. We pulled the data:

:bar_chart: The numbers from our team:

  • PRs per developer: +98% (almost double!)
  • Average PR size: +154% (massive!)
  • Time to review each PR: +47% (so much more to read)
  • Bugs that made it to staging: +9% (quality took a hit)
  • Senior engineer time spent on reviews: +91% (the bottleneck)

This isn’t just us. Faros AI research across 10,000+ developers found the same pattern. Anthropic just launched Claude Code Review specifically because “code output per engineer is up 200% this year and reviews were the bottleneck.”

The Paradox Nobody Talks About

Here’s the mind-bending part: everyone believes they’re faster, but we’re not shipping any faster as a team.

There’s even research showing this perception gap. In a METR study, developers using AI were actually 19% slower to complete tasks—but they believed they were 24% faster. Before starting, they predicted AI would speed them up. After finishing (slower), they still thought it had helped.

We’re experiencing something similar. Devs on my team feel productive because they’re churning out code. But our cycle time—idea to production—hasn’t budged. In some sprints, it’s gotten worse.

Why Senior Engineers Became the Bottleneck

The old workflow looked like this:

  1. Junior writes code (learns patterns, makes mistakes)
  2. Senior reviews (catches issues, teaches better approaches)
  3. Code ships

The new workflow:

  1. Junior (or mid-level) prompts AI to write code
  2. AI generates 2-3x more code in the same time
  3. Senior engineer tries to review all of it
  4. Senior drowns, becomes bottleneck
  5. Mentorship time evaporates

Amazon just mandated that seniors must sign off on all AI-assisted code after multiple AI-related outages. That’s not a solution—that’s admitting the problem.

Senior engineers didn’t sign up to be AI code validators. They’re supposed to architect systems, mentor juniors, and tackle the hardest problems. Instead, they’re spending their days verifying that an AI didn’t introduce subtle bugs into a 500-line PR.

The Questions Keeping Me Up at Night

:thinking: Is this just growing pains? Will we adapt and find equilibrium? Or is this a fundamental mismatch between AI coding speed and human review capacity?

:thinking: Should we invest in AI code review tools? (They exist now—Anthropic, CodeRabbit, others) Or is that just adding more AI to fix the problems AI created?

:thinking: Are we measuring the wrong things? We track PRs merged and code velocity. Should we care about cycle time and customer value instead?

:thinking: How do we preserve mentorship? If juniors aren’t writing the initial code, how do they learn? If seniors aren’t teaching during reviews, when does knowledge transfer happen?

My Startup’s Failed Experiment

When I ran my startup (before it failed—that’s a story for another post :sweat_smile:), we tried AI coding to move faster. We were a 3-person team, and AI felt like having a 4th developer. We generated so much code!

But then we’d spend days debugging issues we didn’t fully understand because we hadn’t written the code ourselves. We’d ship features faster but break old ones. We optimized for velocity and got fragility.

That experience taught me: Code that ships isn’t the same as code that works and code you understand.

So What Do We Do?

I don’t have answers—just a lot of questions and a pile of data that doesn’t align with the AI coding hype.

From a design systems perspective, this feels like a classic “local optimization, global pessimization” problem. We optimized the “write code” step and created chaos everywhere else.

:thought_balloon: Has your team experienced this? Are senior engineers drowning in reviews? Have you found solutions that actually work? Or am I missing something obvious here?

Would love to hear especially from engineering leaders who’ve navigated this—and from seniors who are living it right now.


Sources if you want to dive deeper:

Maya, this hits way too close to home. I’m seeing the exact same pattern across my 40+ person engineering team at our financial services company.

The Data from Our Team

We pulled metrics last month because our delivery velocity wasn’t matching the “AI productivity gains” we kept hearing about:

  • Code merged per sprint: +35% :chart_increasing:
  • Cycle time (issue → production): +27% :chart_decreasing:
  • Senior engineer time in PR reviews: +19% (matches the research you cited)
  • PRs waiting for review > 2 days: +43%

The math is brutal. We’re generating more code, but it’s not translating to faster feature delivery. It’s actually slowing us down.

This Is Amdahl’s Law in Action

For anyone not familiar: Amdahl’s Law says that speeding up one part of a system only helps if it’s the bottleneck. If it’s not, you just move the bottleneck elsewhere.

We optimized code generation (30-40% faster with AI). But we didn’t optimize review, testing, QA, or deployment. So now those became the bottlenecks.

It’s like making your checkout process 3x faster in a grocery store, but you still only have 2 cashiers. You just move the line from the aisles to the checkout.

The Talent Pipeline Crisis

Your point about mentorship is what worries me most for the long term. The traditional path looked like:

Old model:

  • Junior writes code → makes mistakes → senior reviews → junior learns from feedback
  • After 2-3 years, junior becomes mid-level
  • After 5-7 years, mid-level becomes senior

New model:

  • Junior prompts AI → AI writes code → senior reviews AI output
  • Junior never writes the “wrong” code that teaches them why the right way is right
  • How do they become senior engineers?

I’ve had multiple seniors on my team express frustration. They say things like:

  • “I’m spending my day validating AI code instead of designing systems”
  • “I can’t tell if the junior understands the code they’re submitting”
  • “Reviews used to be teaching moments. Now they’re just validation gates”

What We’re Trying (Mixed Results So Far)

1. “AI Review First Pass”
We’re experimenting with CodeRabbit to do initial code review before human review. Theory: Let AI catch basic issues, humans focus on architecture/design.

Reality: It helps, but seniors still need to verify the AI reviewer didn’t miss subtle issues. We’re now reviewing the code and auditing the AI review.

2. Amazon’s Approach (Senior Signoff Required)
After the Amazon outages from AI code, they mandated senior approval for AI-assisted code.

My take: That’s not a solution. That’s admitting AI code needs more scrutiny, which means… it’s creating more work, not less.

3. Limiting AI Use for Complex Features
We’re experimenting with: “AI for boilerplate and utilities, humans for core business logic.”

Too early to tell if it helps, but at least it reduces the volume of AI-generated code that needs review.

Questions for the Community

Maya asked: “Is this just growing pains?” I think the answer depends on what changes:

  1. If review capacity doesn’t scale with code generation capacity, this bottleneck is permanent
  2. If AI review tools get good enough to replace human first-pass review, maybe we adapt
  3. If we realize we’re optimizing the wrong metric (code volume vs customer value), maybe we step back from AI coding entirely for some use cases

Luis, what are other engineering leaders seeing? Anyone found a process that actually works at scale?

And for the seniors out there: How are you handling this? I’ve got retention concerns if we’re burning out our most valuable engineers on validation work.

This thread captures the executive dilemma I’m facing right now. We invested heavily in AI coding tools last year with the promise of 30-40% productivity gains. Now I’m looking at our Q1 numbers, and the ROI story isn’t adding up.

The Numbers Don’t Lie (But They’re Confusing)

Our reality at this mid-stage SaaS company:

  • 25% of our code is now AI-assisted (up from 0% a year ago)
  • Velocity gains: ~10% (not the 30-40% we were promised)
  • Code review backlog: +62% (our biggest bottleneck)
  • Bugs reaching production: +12% (quality regression)
  • Senior engineer satisfaction: -18% (retention risk)

This mirrors Google’s numbers. They reported that 25% of their code is AI-generated but they’re only seeing ~10% velocity gains. The math fundamentally doesn’t add up.

Either:

  1. Our measurement is wrong (we’re measuring the wrong outputs)
  2. Our process is wrong (downstream bottlenecks absorb the gains)
  3. The AI productivity story is wrong (it’s not actually 30-40%)

I suspect it’s all three.

The Strategic Question: Build vs Buy for AI Review

Luis mentioned trying CodeRabbit for AI code review. We’re evaluating multiple options:

Option 1: Anthropic’s Claude Code Review (launched March 9, 2026)

  • Promises to handle “thousands of lines of code” review
  • Lets humans focus on “architectural decisions and business logic”
  • Cost: ~$50/engineer/month (ballpark)

Option 2: Build internal AI review infrastructure

  • Requires dedicated team, compute resources, ongoing maintenance
  • Cost: 2-3 FTE + infrastructure = ~$500K/year minimum

Option 3: Change process instead of adding more AI

  • Limit AI use to specific use cases
  • Invest in senior review capacity (hire more seniors or upskill mids)
  • Cost: Hard to quantify, but potentially lower long-term

The irony: We’re now buying AI tools to fix the problems created by AI tools.

The Hidden Cost Nobody Talks About: Technical Debt

Maya, your startup story resonates. We’re seeing similar patterns at scale.

AI-generated code tends to be:

  • Correct enough to pass tests (so it ships)
  • Not optimized for maintainability (so it accrues debt)
  • Harder to understand later (so refactoring is expensive)

Last quarter, we spent 40% more engineering time on “unplanned work” (bugs, incidents, rework). Some of that is AI-generated code that passed review but had subtle issues we didn’t catch.

Research shows AI code creates 1.7x more issues compared to human-written code. That technical debt compounds.

The real cost calculation:

  • AI tool subscription: $50/engineer/month
  • Senior review overhead: +19% time = ~$25K/senior/year
  • Quality issues and rework: ~$40K/engineer/year (conservative estimate)
  • Total cost: ~$100K/engineer/year

Versus the productivity gains: 10% velocity increase = ~$15K/engineer/year value.

We’re underwater on ROI right now.

Treating This As a Systems Problem

As CTO, I can’t blame developers for using tools we gave them. This is a systems and process problem:

  1. Misaligned incentives: We reward code shipped, not code quality or maintainability
  2. Missing guardrails: We didn’t establish “when to use AI” vs “when not to”
  3. Bottleneck blindness: We optimized one step without looking at the whole system
  4. Measurement gaps: We measure lines of code and PRs, not customer outcomes

The fix isn’t “ban AI tools” or “use more AI tools.” It’s rethinking the entire development workflow.

What We’re Doing Differently

We’re running a 3-month experiment:

Team A (Control): Continue current AI usage patterns
Team B (Guided AI): AI only for boilerplate, tests, documentation
Team C (AI + AI Review): AI coding + AI review (Anthropic’s tool)

Measuring:

  • Cycle time (idea → working in production)
  • Bug escape rate
  • Senior engineer time allocation
  • Developer satisfaction
  • Customer-facing metrics (feature adoption, reliability)

Early indicators (week 6):

  • Team B is slower on feature output but faster on cycle time (less rework)
  • Team C has mixed results (AI review catches some issues, misses others)
  • Team A has highest code output, lowest delivery velocity

The Real Question

Maya asked: “Should we invest in AI code review tools? Or is that just adding more AI to fix problems AI created?”

I don’t know yet. But here’s my framework:

Invest in AI review IF:

  • It demonstrably reduces senior review time by >30%
  • It catches bugs AI coding introduces
  • It doesn’t create a new “review the reviewer” problem

Don’t invest IF:

  • It’s just shifting work around without reducing it
  • It creates new failure modes we have to monitor
  • It’s solving a symptom instead of the root cause

My current hypothesis: We need AI code review as a transitional tool, but the long-term fix is better process design and clearer guidelines on when AI coding adds value vs when it creates overhead.

Anyone else running similar experiments? I’d love to compare notes on what’s actually working.

The perception gap Maya highlighted is the most concerning part of this whole conversation. Developers feel faster but data shows they’re slower. That disconnect has serious implications for trust, autonomy, and psychological safety on engineering teams.

The Perception-Reality Gap Is Dangerous

The METR study findings are striking:

  • Developers predicted AI would make them 24% faster
  • Reality: They were 19% slower
  • After finishing: They still believed AI had helped

This isn’t just “developers are wrong about productivity.” This is a fundamental mismatch between how work feels and what work accomplishes.

When your engineers believe they’re crushing it but your metrics show the opposite, you have a trust problem waiting to happen.

The People Impact Nobody’s Measuring

Michelle’s tracking senior engineer satisfaction (-18%). That number scares me more than the velocity metrics.

Senior engineers are:

  1. Your hardest roles to fill (3 candidates for every opening in 2026)
  2. Your most expensive to lose (18+ months to replace their institutional knowledge)
  3. Your force multipliers (they make everyone around them better)

If we burn them out on AI code validation, we lose more than review capacity. We lose mentorship, architectural vision, and the ability to handle complex problems.

I’ve had three conversations this quarter with senior engineers on my team. All three said some variation of:

“I didn’t become a senior engineer to spend my days checking if AI made mistakes. I want to solve hard problems.”

That’s a retention risk that doesn’t show up in velocity metrics.

The Equity Question We’re Not Asking

Here’s something bothering me: Who bears the review burden disproportionately?

In my experience across Google, Slack, and now here:

  • Women and underrepresented engineers often do more “glue work” (mentoring, reviews, documentation)
  • Senior roles already have invisible labor that doesn’t count toward promotion
  • AI coding increases review volume, which increases invisible labor

If AI coding shifts more burden onto seniors, and seniors from underrepresented groups already carry more invisible load, we’re compounding an existing equity problem.

I don’t have data on this yet, but I’m looking. Has anyone tracked whether review burden increases are distributed evenly across their senior engineers?

The Mentorship Crisis Is Real (And Worse Than We Think)

Luis nailed the talent pipeline problem. But it’s worse than “juniors don’t learn from writing code.”

The old mentorship model:

  • Junior writes code with mistakes
  • Senior reviews, explains why it’s problematic and how to think differently
  • Junior internalizes the mental models
  • Over time, junior develops senior-level judgment

The new reality:

  • Junior prompts AI, gets “correct enough” code
  • Senior reviews AI output (not junior’s thought process)
  • No mistakes to learn from (AI doesn’t make junior mistakes)
  • No insight into how to think (just validation that output is acceptable)

Result: We’re creating a generation of engineers who can prompt AI but can’t architect systems, debug complex issues, or make design trade-offs.

That’s terrifying from a talent development perspective.

What We’re Trying: “AI Pairing” Not “AI Solo Coding”

I’m experimenting with reframing how we use AI:

Old model (AI solo):

  • Engineer prompts AI → AI generates code → Submit for review

New model (AI pairing):

  • Engineer writes initial approach/pseudocode
  • AI helps with implementation details
  • Engineer reviews AI suggestions before accepting
  • Engineer explains approach in PR description

Theory: If the engineer thinks through the problem first, they maintain ownership of the solution. AI becomes a syntax helper, not a thought replacement.

Early results (8 weeks in, 2 teams):

  • PRs are higher quality (fewer revisions needed)
  • Review time is down 15-20% vs “AI solo” baseline
  • Engineers report better understanding of their own code
  • But: Initial development is slightly slower (maybe 10%)

Trade-off: Slightly slower writing, much faster reviewing, better learning.

I’ll take that trade.

The Cultural Shift We Need

Maya said this feels like “local optimization, global pessimization.” Exactly right.

We need to shift from:

  • Optimizing: Lines of code written
  • To optimizing: Features working in production that customers value

And from:

  • Measuring: Individual velocity
  • To measuring: Team throughput and learning

That requires cultural change, not just tooling changes.

Questions I’m Wrestling With

  1. Trust and autonomy: If engineers feel productive but aren’t, how do you have that conversation without damaging trust?

  2. Performance reviews: If someone uses AI heavily and ships lots of code (but team velocity doesn’t improve), how do you evaluate their performance?

  3. Hiring junior engineers: If AI can do “junior-level” work, should we still hire juniors? (My answer: absolutely yes, but I’m hearing the opposite from some leaders)

  4. Defining “senior engineer” in the AI era: If the path from junior → senior is disrupted, what does senior even mean? Just people who can review AI code?

This thread is one of the most important conversations I’ve seen about AI coding. We’re all trying to figure this out in real time, and the stakes are high—not just for velocity, but for people’s careers, team culture, and what it even means to be a software engineer.

Would especially love to hear from other VPs/Directors on how you’re handling the people and culture side of this shift.

Coming at this from the product side, and I have to say: we’re measuring all the wrong things.

Engineering keeps telling me they’re shipping faster. Our metrics show more PRs merged, more code deployed. But here’s what I’m seeing from the product and customer side:

The Disconnect Between Shipping and Value

What engineering reports:

  • Sprint velocity: +30%
  • Story points completed: +35%
  • Features shipped: +25%

What product sees:

  • Time from idea to working feature: Unchanged (or slower)
  • Customer-reported bugs: +15%
  • Feature adoption: Down slightly
  • Time spent on bug fixes and rework: +40%

We’re confusing output with outcomes.

Engineering is producing more code. But are we delivering more customer value? The data says no.

The DORA Metrics Trap

This conversation reminds me of the broader discussion about DORA metrics: Teams hit “Elite” on deployment frequency and lead time, but product delivery is still slow.

You can have perfect engineering metrics and still fail on business outcomes.

The AI coding situation is the same pattern:

  • Local metric: Code writing speed :white_check_mark: (+30%)
  • Global metric: Customer value delivered :cross_mark: (flat or declining)

We’re optimizing for the wrong success criteria.

What I Care About (And What I Think You Should Too)

As VP Product, here’s what actually matters:

  1. Time to working feature in production (not just deployed, but working and adopted)
  2. Customer-reported quality issues (bugs that impact users)
  3. Feature success rate (what % of shipped features get adopted?)
  4. Engineering team capacity for new work (vs firefighting and rework)

By these measures, AI coding isn’t helping. It might be hurting.

The Customer Impact Nobody’s Tracking

Here’s a story from last quarter that captures this:

We shipped a major feature 2 weeks “ahead of schedule” (engineering’s timeline). Team celebrated. AI coding helped us move fast.

But:

  • Week 1 post-launch: 3 critical bugs that broke core workflows
  • Week 2-3: Emergency hotfixes and customer apologies
  • Week 4: Feature adoption 40% below forecast because trust was damaged

Net result: We shipped fast and delivered slow. The 2-week “gain” became a 6-week setback.

When I dug into the code, the feature was mostly AI-generated. It passed all tests. It cleared code review. But it had subtle edge cases that AI didn’t consider and reviewers didn’t catch because the PRs were too large and there were too many to review thoroughly.

The Business Case Is Breaking Down

Michelle laid out the cost calculation. Let me add the product side:

Cost of AI coding (per engineer/year):

  • Tool subscription: $50/month = $600/year
  • Senior review overhead: ~$25K/year
  • Quality issues and rework: ~$40K/year
  • Total: ~$65K/year per engineer

Value of AI coding (per engineer/year):

  • 10% velocity improvement = ~$15K in faster delivery
  • Minus customer-facing quality issues = -$10K (conservative)
  • Net value: ~$5K/year per engineer

We’re spending $65K to get $5K of value. That’s a terrible ROI.

And this doesn’t even account for the opportunity cost: What could seniors be working on if they weren’t validating AI code?

The Provocative Question

Here’s the thing I keep coming back to: What if AI coding is like premature optimization?

Premature optimization: You make the code faster before you know what’s slow. You add complexity and reduce maintainability. Later, you realize you optimized the wrong thing.

AI coding: You make code writing faster before you know that’s the bottleneck. You add review overhead and reduce quality. Later, you realize code writing wasn’t the constraint.

Maybe for some teams, in some contexts, code generation is the bottleneck. But for most teams I talk to, the constraints are:

  • Requirements clarity (what are we actually building?)
  • Design decisions (how should this work?)
  • Cross-team coordination (who owns what?)
  • Customer feedback loops (is this the right solution?)

AI doesn’t help with any of that. It just makes it faster to build the wrong thing.

What Product Wants from Engineering

I don’t care if you use AI or not. What I care about:

  1. Ship working features (not just code)
  2. Ship features customers want (not just features we specced)
  3. Ship sustainable code (not technical debt time bombs)
  4. Preserve the ability to adapt (not lock us into rigid architectures)

If AI helps with that, great. If it’s creating more rework and burning out seniors, then we’re going backwards.

The Metrics We Should Actually Track

Instead of:

  • PRs merged
  • Lines of code
  • Sprint velocity

Track:

  • Time to validated customer value (idea → users successfully using the feature)
  • Feature success rate (% of shipped features that meet adoption goals)
  • Unplanned work ratio (% of engineering time spent on bugs/incidents vs new features)
  • Engineering capacity for strategic work (% of senior time on architecture, technical strategy, mentorship)

By these metrics, AI coding is failing. At least at our company.

The Conversation We Need to Have

Maya, Luis, Michelle, Keisha—this thread is exactly what we need. Cross-functional reality check.

Engineering says: “We’re faster!”
Product says: “I don’t see it.”
Leadership says: “Where’s the ROI?”
People managers say: “Seniors are burning out.”

Everyone’s right. And that means the problem isn’t individual performance—it’s the system.

We need to stop asking “How do we use AI coding better?” and start asking “What business outcomes are we trying to achieve, and is AI coding actually helping?”

For some use cases (boilerplate, tests, documentation), yes.
For core business logic and complex features, I’m increasingly skeptical.

Curious what other product leaders are seeing. Are you measuring customer-facing outcomes differently with AI coding? Or are we all flying blind on this?