We’re Writing 59% More Code with AI, But Shipping Less to Production. Where’s the Bottleneck?
I need to share something that’s been bothering me for the past few months. My team at our financial services company has fully embraced AI coding assistants—GitHub Copilot, Cursor, you name it. Our developers genuinely feel more productive. They’re writing code faster, trying more experiments, and feeling less stuck on boilerplate.
But here’s the thing: Our deployment frequency hasn’t improved. In fact, it’s gotten slightly worse.
I started digging into our metrics, and then CircleCI’s 2026 State of Software Delivery report landed on my desk. The numbers stopped me cold:
The 70.8% Paradox
Across 28 million CI workflows, CircleCI found that average engineering throughput increased 59% year over year—the biggest jump they’ve ever measured. That’s enormous. AI-assisted development is clearly accelerating code generation.
But here’s the paradox: Main branch success rates dropped to 70.8%, the lowest in over five years. The industry benchmark is 90%. That means nearly 3 out of 10 attempts to merge into production are failing.
Even more telling: Feature branch throughput is up 15.2%, but main branch throughput is down 6.8%. We’re creating more on branches but shipping less to production.
The Review Bottleneck Theory
I think code review has become the binding constraint. Here’s what I’m seeing with my team of 40+ engineers:
- Same review capacity: We still have roughly the same number of senior engineers doing reviews as we did two years ago
- 59% more code to review: But now there’s dramatically more code flowing through the pipeline
- Longer wait times: Developers are context-switching while waiting for review, which kills productivity
- Recovery time up 13%: When things do fail, it takes longer to fix (average 72 minutes)
Three Hypotheses
I see three possible explanations:
- We need AI-assisted code review - Maybe the solution is AI reviewing AI-generated code?
- Quality gates are working - Maybe 70.8% is actually good—the review process is correctly filtering out low-quality AI output
- Process mismatch - Our review process was designed for human-speed code generation, not AI-speed
The Financial Services Context
In our world, we can’t just “move fast and break things.” Compliance, security, and regulatory requirements are non-negotiable. But we’re also under pressure to deliver features faster to compete with fintechs.
The painful irony: AI promised to accelerate development, but it’s created a different bottleneck downstream.
Questions for the Community
- Are you seeing similar patterns in your organizations?
- Is code review the bottleneck, or have you identified something else?
- How are you scaling review capacity to match increased code generation?
- Should we even be aiming for 90% merge success in the AI era, or is 70% the new normal?
I’m genuinely curious whether this is a process problem, a tooling problem, or maybe a sign that our quality gates are actually working as intended by catching problematic AI-generated code.
What’s your take?
Sources: