We're shipping 59% more code with AI but main branch throughput is down. Are we optimizing the wrong bottleneck?

Last quarter I noticed something strange in our sprint reviews. Every team was crushing their story point targets. PRs were flying. Developers reported feeling more productive than ever. Yet somehow, our production deployment cadence hadn’t budged. We were shipping the same number of customer-facing releases as six months ago — before we rolled out AI coding assistants company-wide.

The data explains why, and it’s more alarming than I expected.

The Productivity Paradox in Numbers

CircleCI’s 2026 State of Software Delivery report analyzed over 28 million builds and found something remarkable: AI-assisted development drove a 59% increase in average engineering throughput. That’s massive. That’s the kind of number that gets CFOs excited about ROI on AI tool investments.

But here’s the twist: while throughput on feature branches increased 15.2%, throughput on the main branch declined 6.8%. Teams are writing more code than ever, but shipping less to production.

Even more concerning:

  • Main branch success rates dropped to 70.8% — the lowest in over five years
  • Recovery times climbed to 72 minutes to get back to green, up 13% from last year
  • Teams saw a 98% increase in merged pull requests but 91% longer review times

We’re creating more code but getting slower at integrating it.

We Optimized the 15% Problem

Here’s what I think happened: coding represents only about 15% of the work involved in shipping software. The other 85% — code review, testing, security scanning, compliance checks, integration, deployment — still relies on fragmented tools and manual processes.

AI coding assistants accelerated the 15%. But they left the 85% untouched. Actually worse — they added more load to those downstream processes.

From a product perspective, this feels like classic premature optimization. We automated the part that was already relatively fast. Meanwhile the real constraints — validation, integration, recovery — got worse because they’re now processing higher volume.

The Trust Tax

There’s another layer here: 46% of developers don’t fully trust AI results. Only 33% say they actually trust the code AI generates.

That trust gap compounds the bottleneck. Engineers aren’t just reviewing AI code — they’re reviewing it more carefully than human-written code. More scrutiny × more volume = review time explosion.

And when AI-generated code that passes review still causes 3 out of 10 main branch builds to fail? That trust deficit seems pretty justified.

The Business Impact We’re Not Measuring

Our finance team loves the AI productivity metrics. But those metrics measure the wrong thing.

We count:

  • Lines of code written
  • PRs merged
  • Story points completed

We don’t count:

  • Features in production
  • Time from commit to customer
  • Recovery time when things break

If I’m honest, we celebrated AI adoption without measuring end-to-end flow. We optimized for looking busy instead of shipping value.

The Real Question

I keep coming back to this: Are we bottlenecked by process, not code?

If validation, integration, and deployment can’t keep pace with AI-generated code volume, then every dollar we spend on better AI coding tools is wasted. Worse than wasted — it makes the bottleneck worse.

Maybe 2026 is the year we stop investing in “write code faster” and start investing in “integrate code faster.” Autonomous validation. Intelligent CI/CD orchestration. Recovery automation.

What percentage of your engineering investment goes to code generation versus delivery systems?

Because right now, we’re spending 80% of our budget accelerating the thing that takes 15% of the time. That’s not strategy. That’s cargo cult productivity theater.


Stats sources: CircleCI 2026 State of Software Delivery, GitLab AI Paradox Analysis, Panto AI Coding Statistics

This resonates so hard from a design systems perspective.

A few years ago we rolled out Figma’s AI features and saw the same pattern — teams could generate variants and components way faster. Felt like productivity heaven. Until we looked at what actually shipped to users.

Turns out we were creating more design artifacts but making slower decisions. More options to review, more internal debate about which AI-generated variant to use, more time validating that the AI actually understood our design tokens and accessibility requirements.

The Trust Problem Compounds Everything

That 46% trust gap you mentioned? I see it in every PR review now.

When I know a teammate wrote something, I trust their judgment. I skim the diff, spot-check the logic, approve.

When I know AI wrote something, I read every line. I check for edge cases. I verify it actually solves the problem instead of just looking like it solves the problem. I question whether it fits our architecture patterns or just pattern-matched from Stack Overflow.

It’s not that AI code is necessarily worse. It’s that I don’t trust the context behind it. Did the engineer understand what they asked for? Did they review what they got? Or did they just accept the first suggestion that compiled?

One of our senior devs told me last week: “I spend more time reviewing AI-generated PRs than I used to spend writing the code myself.”

That’s not productivity. That’s just redistributing the work.

The Design Parallel: More Artifacts ≠ Better Experience

In design, fast prototyping tools created this exact trap a decade ago.

Suddenly anyone could mock up 10 different layouts in an afternoon. PMs loved it. “Look at all these options!”

But more options didn’t mean better user experience. It meant:

  • More decision fatigue
  • Longer design reviews
  • Inconsistent patterns across the product
  • Junior designers skipping research because prototyping felt like progress

We were busy but not effective.

Sound familiar?

Are We Creating Productivity Theater?

Your phrase “cargo cult productivity theater” really stuck with me.

We’re measuring activity (PRs created, lines written, components generated) instead of outcomes (features shipped, user problems solved, technical debt reduced).

And AI tools are optimized to maximize those activity metrics. Of course they are — that’s what gets celebrated in sprint reviews and performance evaluations.

But if those activity metrics don’t correlate with actual business value, we’re just… performing productivity while the real work piles up in code review and QA and production incidents.

Question for the thread: Has anyone actually measured whether AI coding tools improved end-to-end delivery time at their org? Not dev velocity, not PR throughput — actual idea-to-production cycle time?

Because I’m starting to suspect we’re solving the wrong problem entirely.


Great post, David. This needs way more attention than it’s getting in leadership circles.

David, this is the conversation we need to be having in 2026. I’m seeing exactly this pattern across my 40+ person engineering org.

The Reality from the Trenches

Last quarter our teams shipped 3x more features to staging than the same quarter last year. Story points through the roof. Velocity charts going up and to the right.

But customer-facing releases? Basically flat.

The disconnect became obvious when I started tracking what I call “staging-to-production gap” — how long features sit in staging before we ship them to customers.

It’s grown from an average of 4 days to 11 days. And climbing.

Why? Every reason you listed:

  • More code to review
  • Lower confidence in AI-generated code quality
  • More integration failures
  • Longer recovery cycles when things break

We’re producing more but shipping less. Classic bottleneck shift.

Financial Services Context: Compliance Can’t Be Accelerated

In fintech, the bottleneck is even more pronounced because we can’t just “move fast and break things.”

Every deployment goes through:

  • Security review
  • Compliance validation
  • Risk assessment
  • Regulatory change management

AI coding tools did exactly zero to accelerate those processes. Actually made them harder because reviewers are questioning AI-generated code more heavily.

Your stat about 72-minute recovery times really alarmed me. That’s not just a productivity issue — that’s a quality and architectural issue.

When nearly 3 out of 10 main branch builds fail, it suggests AI is generating code that compiles and passes initial tests but doesn’t integrate cleanly with the broader system.

The Cultural Challenge: Junior Engineers + AI = Technical Debt

Here’s what worries me most: I’m seeing junior engineers trust AI output without really understanding it.

They ask Copilot to implement a feature. It generates plausible-looking code. Tests pass locally. They submit the PR.

But they didn’t actually learn the domain. They can’t explain the tradeoffs. They don’t recognize when the AI made questionable architectural decisions.

Six months ago a junior engineer would ask a senior for guidance, pair program, learn the patterns. Now they ask AI, get an answer, and move on.

We’re accumulating technical debt disguised as velocity.

The Process Question

You asked: “Are we bottlenecked by process, not code?”

Yes. Absolutely yes.

But here’s the harder question: How do we redesign code review processes for the AI-generated code era?

Our existing review processes assume:

  • The author deeply understands the code they wrote
  • Code volume stays relatively consistent
  • Reviewers can trust author judgment on implementation details

None of those assumptions hold anymore.

We need new processes for:

  • Validating that the engineer understands the AI-generated code they’re submitting
  • Automated architectural conformance checking (because reviewers can’t manually verify every pattern)
  • Better integration testing earlier in the pipeline
  • Faster feedback loops when things break

I’d love to hear how other engineering leaders are tackling this. Because right now we’re trying to fit AI-era code volume into pre-AI processes, and it’s breaking.


Context: Leading engineering teams at a Fortune 500 financial services company. Happy to share more about what we’re trying.

This is one of the most important threads I’ve seen on this forum. Thank you, David, for bringing data to what many of us are feeling intuitively.

The Leadership Blind Spot

Here’s what I’m embarrassed to admit: we celebrated crossing 80% AI coding tool adoption across our engineering org last quarter.

Had a whole all-hands presentation. Showed graphs of increased PR throughput. Talked about being “AI-first.”

We measured:

  • Tool adoption rate
  • PRs created per engineer
  • Lines of code generated by AI

We didn’t measure:

  • Features deployed to production
  • Customer-facing release cadence
  • Mean time to recovery
  • Developer satisfaction with the review process

We optimized for adoption metrics instead of business outcomes.

That 70.8% main branch success rate stat is a huge red flag. That’s not a tooling problem. That’s an organizational health problem.

When 3 out of 10 production builds fail, you have systematic quality issues. And quality issues destroy team morale faster than any other metric.

The Team Morale Impact

Here’s what I’m hearing in 1:1s lately:

“I feel productive writing code, but frustrated that nothing ships.”

“I spent more time in code review last sprint than coding.”

“I don’t trust my own PRs anymore because I’m not sure what the AI actually did.”

Engineers feel productive at the individual level but ineffective at the team level. That dissonance is corrosive.

And when they see leadership celebrating velocity metrics while they’re stuck in integration hell? Trust erodes fast.

The Investment Mismatch

You asked what percentage of engineering investment goes to code generation versus delivery systems.

I just looked at our 2025 spending:

  • AI coding tools: K annually (Copilot licenses, Cursor, other tools)
  • CI/CD improvements: K (mostly CircleCI plan upgrade)
  • Code review tooling: /bin/zsh (using free tier Gerrit)
  • Testing infrastructure: K
  • Deployment automation: K

We spent 2.5x more on writing code faster than on integrating, testing, and deploying code faster.

That ratio is completely backwards given the bottleneck.

The EdTech Context: Different Skills Required

In our EdTech product, we’ve learned that using AI tools vs. integrating AI output are fundamentally different skills.

Using AI tools requires:

  • Knowing what to ask for
  • Recognizing plausible output
  • Basic testing

Integrating AI output requires:

  • Deep system understanding
  • Architectural judgment
  • Ability to evaluate tradeoffs
  • Debugging skills when things break

We’re training engineers on the first skill set while the second is what actually creates bottlenecks.

The Metrics Shift Needed

Maya asked whether anyone’s measured end-to-end delivery time. We started tracking this last month.

Preliminary data:

  • Time from story assigned to PR created: Down 40% (AI coding tools working!)
  • Time from PR created to PR approved: Up 85% (review bottleneck)
  • Time from merge to production: Up 60% (integration failures)
  • Net change in idea-to-production: Up 22%

We got faster at the part that was already fast. Got slower at everything else. Net result: slower overall.

Strategic Question for VPs and CTOs

Should VP-level engineering metrics shift from:

  • Story points completed → Features deployed to production
  • PR throughput → Deployment frequency
  • Code written → Code shipped and stable

Because right now we’re incentivizing teams to create code, not ship value.

I’m proposing this shift to our executive team next week. Would love to hear if others have tried reframing metrics this way and what resistance you encountered.


Really appreciate this discussion. Sharing internally with our leadership team.

This thread is required reading for every CTO in 2026.

David’s analysis is spot-on: this is a classic Theory of Constraints problem. We optimized a non-constraint and made the real bottleneck worse.

The Constraint Theory Lens

In manufacturing, when you speed up one part of the assembly line without expanding capacity downstream, you don’t increase output. You just create piles of work-in-progress inventory.

That’s exactly what’s happening with AI coding tools:

We accelerated code generation (non-constraint)
Without expanding validation/integration capacity (the actual constraint)
Result: Work piles up in code review and CI/CD

Every dollar spent on AI coding tools increased pressure on the bottleneck without relieving it.

From a systems perspective, that spending had negative ROI — it made the constraint worse.

The Data Validates This Hypothesis

The CircleCI numbers tell the story clearly:

  • Feature branch throughput: +15.2% (we can write code faster)
  • Main branch throughput: -6.8% (but we can’t integrate it faster)
  • Main branch success rate: 70.8% (quality degraded under increased load)
  • Recovery time: +13% (downstream systems overwhelmed)

This is not a tooling problem. This is an architectural problem.

AI tools generate code that passes local tests but fails at integration because:

  1. AI doesn’t understand our specific architecture patterns
  2. AI can’t validate cross-system dependencies
  3. AI optimizes for “code that works in isolation” not “code that integrates cleanly”

The Technical Debt Angle

The increased code churn stat is deeply concerning.

When a significant percentage of code written gets discarded within two weeks, it suggests:

  • Engineers accepting AI suggestions without validating fit
  • Code that solves the immediate problem but creates integration issues
  • Lack of architectural thinking in the generation process

We’re trading short-term velocity for long-term technical debt.

I saw this exact pattern during our cloud migration. Teams could spin up new infrastructure much faster with IaC tools. But governance, security review, cost management, and operational support couldn’t keep pace.

Result: infrastructure sprawl, security gaps, runaway costs. We had to slow down provisioning and invest heavily in governance before we could scale again.

The Business Impact: ROI is Negative

Let me be direct about the business case:

If AI coding tools increase code output by 59% but:

  • Review time increases 91%
  • Integration failures increase 30% (implied by 70.8% success rate)
  • Recovery time increases 13%
  • End-to-end delivery time increases 22% (per Keisha’s data)

Then the ROI on AI coding tool investment is negative.

We’re paying for tools that make us slower.

CFOs are asking me to justify AI tool spending. I can show adoption metrics and code generation volume. But when they ask “did this make us ship faster?” the honest answer is no.

The Framework: Idea-to-Production Time

David asked what percentage of investment goes to code generation vs. delivery systems.

Better question: What’s your idea-to-production cycle time?

That’s the only metric that matters for business value:

  • How fast can we test product hypotheses?
  • How fast can we respond to customer feedback?
  • How fast can we fix production issues?

Everything else is activity theater.

At my current company:

  • 2024 average idea-to-production: 14 days
  • 2025 average (post-AI tool rollout): 18 days

We got slower after adopting AI coding tools.

The 2026 Budget Shift

Based on this analysis, I’m shifting our 2026 engineering budget:

Reduce spend on:

  • Additional AI coding tool licenses (maintain current, don’t expand)
  • Productivity tools that optimize code writing

Increase spend on:

  • Autonomous validation platforms
  • Intelligent CI/CD orchestration (GitLab Ultimate, CircleCI Scale)
  • Automated integration testing infrastructure
  • Recovery automation and observability
  • Architecture conformance tooling

Target: Cut idea-to-production time by 40% by addressing the actual constraint.

Call to Action for CTOs

If you’re a CTO reading this:

  1. Measure end-to-end delivery time, not code generation speed
  2. Identify your actual constraint — probably integration/validation, not coding
  3. Reallocate budget from code generation to delivery systems
  4. Change incentives — reward code shipped and stable, not code written
  5. Invest in autonomous validation — the only way to match AI generation speed

The AI coding boom is real. But we’re investing in the wrong part of the pipeline.

Fix the constraint, then increase throughput. Not the other way around.


Context: CTO at mid-stage SaaS company, previously led engineering at Twilio and Microsoft. Happy to share our budget allocation and metrics framework with anyone interested.

This discussion deserves broader visibility. Tagging our head of engineering finance to review.