AI Makes Us Code Faster, But Review, Testing, and Deployment Are Still Slow—Where's the Real Bottleneck?

Building on the productivity measurement discussion, I want to dig into something Luis and Michelle both mentioned: pipeline bottlenecks.

The 5% Problem

Research shows that AI coding assistants speed up the coding portion of software delivery—which represents roughly 5-10% of the total pipeline from idea to production (Axify analysis).

The other 90-95%?

  • Requirements clarification and design
  • Code review
  • Testing and QA
  • Integration and deployment
  • Documentation and knowledge transfer

We optimized 5% of the pipeline and wondered why the whole system didn’t speed up.

This is basic systems thinking: A pipeline is only as fast as its slowest step. Making the fastest step even faster doesn’t help if everything else is unchanged.

Where I’m Seeing the Backlog Build

In our design systems work, we’re now seeing:

Before AI:

  • Dev time: 40% of cycle time
  • Review time: 25%
  • Testing: 20%
  • Deployment: 15%

After AI:

  • Dev time: 25% of cycle time (improved!)
  • Review time: 35% (bottleneck shifted here)
  • Testing: 25% (longer queue because more code)
  • Deployment: 15% (unchanged)

Result: Overall cycle time only improved 8%, even though coding time improved 37%.

The faster coding created a backlog in review and testing that we weren’t prepared for.

The Uncomfortable Question

Should we:

A) Speed up the rest of the pipeline to match AI coding velocity?

  • Invest in automated code review tools
  • Scale QA team and test automation
  • Improve CI/CD infrastructure for faster deployments

B) Slow down coding to match downstream capacity?

  • Gate AI usage to prevent overwhelming review/test
  • Maintain current process and accept modest gains
  • Let natural equilibrium emerge

C) Rethink the entire pipeline for the AI era?

  • Merge coding and review into single step (pair programming with AI?)
  • Shift testing left (AI generates tests alongside code)
  • Automate deployment gates that are currently manual

I honestly don’t know which is right.

The Systems View

Michelle’s original thread asked if we’re measuring wrong things. But maybe we also need to ask: Are we optimizing the wrong part of the system?

If review and testing are the real bottlenecks, shouldn’t our AI investment focus there instead of making coding (which wasn’t the bottleneck) even faster?

What I’m Curious About

  1. Where are bottlenecks appearing in your delivery pipelines post-AI?
  2. Are you investing to eliminate bottlenecks, or accepting that AI gains will be modest until you do?
  3. What’s working in terms of pipeline reoptimization for the AI era?

From a design perspective, I suspect quality gates are where AI gains will disappear unless we fundamentally rethink how we ensure quality at higher code volumes.

Thoughts?

Maya, you’ve identified the exact problem we’re hitting in fintech.

Code Review Is the New Bottleneck

Pre-AI, our senior engineers spent ~6 hours/week on code review. Post-AI? 12-14 hours/week.

Why the increase?

1. Volume went up
More code to review because developers are more productive

2. Cognitive load went up
Reviewing AI-generated code requires different mental effort than reviewing human code

When I review code written by a developer I know, I can:

  • Predict their patterns and scan for deviations
  • Trust their testing approach based on past work
  • Focus review on business logic, not syntax

When I review AI-generated code:

  • I can’t rely on “this looks like typical Sarah code”
  • I need to verify the AI actually understood the requirement
  • I’m checking for subtle logic bugs that compile fine but fail edge cases
  • I’m validating that it follows our architectural patterns, not just “a pattern”

Every AI PR feels like reviewing a junior developer’s code—even when it’s generated by a senior developer.

The Solution That Didn’t Work

We tried using AI for code review too. GitHub Copilot, ChatGPT analysis, automated PR review tools.

Result: More noise, not less work.

The AI review tools flagged style issues and obvious problems, but missed subtle architectural violations and business logic errors. We still needed human review for anything that mattered.

So now reviewers are reading AI-generated code AND AI-generated reviews. Double the AI, not half the work.

What’s Actually Helping

Option C from your list: Rethinking the pipeline.

We’re experimenting with:

1. AI-specific review checklists

  • Does this code follow our architectural patterns?
  • Are edge cases handled or just the happy path?
  • Is error handling robust or minimal?
  • Does it integrate with existing systems correctly?

Forces reviewers to verify what AI often gets wrong, instead of general “does this look OK?” review.

2. Automated gates BEFORE human review

  • Static analysis for security issues
  • Design system compliance checks
  • Test coverage requirements (AI must generate tests too)
  • Performance benchmarks

Catches the low-value stuff AI screws up, so humans focus on high-value architectural review.

3. Pair programming with AI as third party

  • Developer + AI writes code
  • Second developer reviews in real-time
  • Faster feedback loop than async PR review

Works well for complex features where requirements need iteration. Doesn’t scale for everything.

The Investment Question

Maya, your question “should we invest in eliminating bottlenecks?” is critical.

My answer: Yes, but carefully.

Invest in automation that handles what AI does poorly:

  • Security scanning (AI introduces vulnerabilities)
  • Accessibility checks (AI ignores WCAG)
  • Performance testing (AI optimizes for working, not fast)

Don’t invest in scaling manual processes:

  • Hiring more reviewers to handle more PRs is not sustainable
  • Expanding QA team to match increased code volume doesn’t scale

Instead, shift the work left:

  • Developers are responsible for AI-generated quality, not reviewers
  • AI must generate tests alongside code
  • Automated gates catch problems before review

The Uncomfortable Reality

Even with all this, our end-to-end cycle time only improved 12% despite coding time improving 35%.

The bottleneck shifted from coding → review → testing. We addressed review somewhat, but testing is still behind.

The 90% of the pipeline that isn’t coding needs just as much AI reinvention as coding got.

But most AI investment is still focused on coding tools, not testing tools, not deployment automation, not review assistance.

We’re still optimizing the wrong 5%.

Maya and Luis have identified the core issue. Let me add the infrastructure perspective.

Testing Infrastructure Wasn’t Designed for This Volume

Our CI/CD pipeline was built for our pre-AI code velocity. Now we’re hitting infrastructure limits we didn’t know existed.

Concrete example:

  • Pre-AI: ~45 PRs per week, average test suite runtime 18 minutes
  • Post-AI: ~73 PRs per week, same test suite runtime

Problem: CI/CD queue times went from “usually instant” to “30-45 minute waits during peak hours.”

We literally don’t have enough CI/CD runners to handle the increased throughput. Developers are waiting for test results longer than it takes them to write code with AI.

The speedup in coding created a bottleneck in infrastructure.

The Investment Dilemma

Luis mentioned investing carefully. Here’s the business reality:

Option 1: Scale infrastructure ($120K/year to add CI/CD capacity)
Option 2: Scale testing team ($400K/year for 2 more QA engineers)
Option 3: Do nothing (accept that gains will be limited)

I chose Option 1. Infrastructure scales better than headcount.

But here’s what surprised me: Even after scaling CI/CD, our deployment frequency only improved 15%.

Why? Because the next bottleneck appeared: deployment approval workflows.

Bottleneck Whack-a-Mole

This is the systems problem Maya identified:

  1. Make coding faster → review becomes bottleneck
  2. Speed up review → testing becomes bottleneck
  3. Scale testing → deployment becomes bottleneck
  4. Automate deployment → we’d hit product prioritization as bottleneck

There’s always another bottleneck.

The Strategic Question

Given infinite bottlenecks, where should we invest?

My framework:

High ROI, high impact:

  • Automated security scanning (prevents incidents)
  • Test infrastructure scaling (enables everything downstream)
  • Deployment automation (removes manual gates)

Low ROI, low impact:

  • More code review tooling (noise, not signal)
  • Expanding manual QA team (doesn’t scale)
  • Faster code completion (we’re already fast enough here)

High impact, uncertain ROI:

  • Rethinking the entire SDLC for AI era
  • Retraining teams on AI-native workflows
  • Culture change around quality ownership

I’m betting on the third category, but it’s a multi-quarter investment with no guaranteed payoff.

What’s Actually Working

Luis’s “shift left” approach is right. We’re implementing:

1. Quality gates during coding, not after:

  • AI generates code + tests + documentation simultaneously
  • Developer responsible for all three before PR
  • Automated checks run locally before push

2. Continuous deployment by default:

  • Remove manual approval for low-risk changes
  • Automated rollback on failure
  • Gradual rollout with automatic monitoring

3. Smaller, more frequent deployments:

  • AI makes it easier to break work into smaller PRs
  • Deploy multiple times per day instead of weekly
  • Reduces batch size, which reduces review burden per PR

Result so far: 22% improvement in end-to-end cycle time, versus the 8% we had before process changes.

The lesson: AI coding productivity only translates to delivery productivity if you redesign the system to match the new capabilities.

The Warning

Here’s what worries me: Most organizations are adding AI tools without changing processes.

They’re expecting productivity gains from technology alone. But technology doesn’t remove bottlenecks—it just moves them.

You can’t buy your way to faster delivery. You have to design for it.

How are others approaching this? Are you redesigning processes, or adding AI to existing workflows and hoping for the best?

From a product perspective, this bottleneck discussion is revealing something important: Companies with mature engineering practices are seeing better AI productivity translation.

The Pre-Existing Infrastructure Advantage

I’m noticing a pattern in conversations with other product leaders:

Companies seeing real AI productivity gains:

  • Already had CI/CD maturity (automated testing, continuous deployment)
  • Already measured DORA metrics and optimized them
  • Already had small batch sizes and frequent releases
  • Already had strong engineering culture around quality ownership

Companies seeing the paradox (gains don’t translate):

  • Manual testing and deployment processes
  • Large, infrequent releases
  • Siloed teams (dev throws code over wall to QA)
  • Quality as a separate phase, not built-in

Hypothesis: AI productivity gains are gated by existing engineering maturity.

If your pipeline is already optimized, AI helps. If your pipeline is broken, AI just exposes how broken it is.

Is AI a Diagnostic Tool?

Michelle, your “bottleneck whack-a-mole” comment made me think:

Maybe AI’s real value isn’t making us faster—it’s making our process problems visible.

Pre-AI, slow code review seemed fine because coding was slow too. Post-AI, code review is obviously a bottleneck.

Pre-AI, insufficient CI/CD capacity wasn’t obvious. Post-AI, queue times are painful.

Pre-AI, deployment approval delays seemed necessary. Post-AI, they’re clearly waste.

AI is like a stress test: It pushes volume through your system and reveals where the weaknesses are.

The Product Implications

From a product strategy perspective, this means:

1. AI productivity investment requires process investment

If you budget $200K for AI coding tools, you need to budget $500K for process improvement (CI/CD, automation, training, culture).

Otherwise you’re just buying developer satisfaction, not business outcomes.

2. Mature orgs have an AI multiplier advantage

Companies that already invested in engineering excellence are getting 2-3× more value from AI than companies that didn’t.

This could widen the gap between high-performing and low-performing engineering orgs.

3. “Fix your pipeline first” might be the right advice

Should struggling teams adopt AI first, or fix their processes first?

My controversial take: Fix the pipeline first, then add AI.

AI will accelerate a good process. It won’t fix a broken one—it’ll just create more output that gets stuck in your broken review/test/deploy pipeline.

What This Means for Prioritization

Maya asked whether to invest in eliminating bottlenecks. I think the answer is obvious now:

Yes—and do it before or alongside AI adoption, not after.

The ROI of AI tools depends entirely on the quality of your delivery infrastructure.

If you have manual deployments, fix that before buying Copilot.
If you don’t have automated testing, fix that before optimizing code generation.
If your review process is slow, fix that before increasing code volume.

Otherwise you’re pouring water into a leaky bucket and wondering why it’s not filling up.

Does this resonate with others? Am I overstating the importance of pre-existing process maturity?