CircleCI Reports 59% Throughput Increase From AI, Yet 85% of Orgs See No Team-Level Gains. What Are the Winners Doing Differently?

I’ve been staring at CircleCI’s 2026 State of Software Delivery report for the past week, and there’s a data point that won’t leave my head: across all projects on their platform, daily workflow runs increased 59% year-over-year. On the surface, that’s exactly what we’d expect from AI-assisted development—more code, faster iteration, higher throughput.

But here’s where it gets interesting: the top 5% of teams nearly doubled their throughput (97% increase), while the median team saw just 4%, and the bottom 25% saw no measurable increase at all.

This isn’t just a CircleCI phenomenon. The recent PwC Global CEO Survey covering 4,454 CEOs across 95 countries found that 56% say they’ve gotten “nothing out of” their AI investments, and only 12% report that AI both grew revenues AND reduced costs. We’re seeing a massive divergence between AI winners and everyone else.

The Individual vs. Organizational Productivity Gap

At our company, I’ve watched our engineers adopt AI coding assistants at scale—over 75% of the team is using them daily. When I talk to individual developers, they universally report being more productive. The research backs this up: controlled experiments show 30-55% speed improvements for scoped tasks like writing functions, generating tests, or producing boilerplate.

But when I look at our product delivery metrics? We’re shipping features at roughly the same pace as a year ago. Maybe slightly faster, but nothing close to a 30-55% improvement.

At first, I thought we were doing something wrong. Then I found this stat: 90% of “super productive” workers (the 10% who’ve truly mastered AI tools and save 20+ hours per week) report that AI creates MORE coordination work between team members. That’s when it clicked.

Where the Productivity Gains Are Going

The CircleCI data reveals the quality tax we’re paying: main branch success rates dropped to 70.8%—the lowest in over five years. Nearly 3 out of every 10 attempts to merge into production are failing. Recovery times are up 13% year-over-year to an average of 72 minutes.

Individual developers are writing code faster, but that acceleration is being absorbed by:

  • Longer code review queues (senior engineers drowning in PRs)
  • More failed builds and broken tests
  • Increased coordination overhead
  • Sequential handoffs that were designed for a slower pace

One framework I’ve been using: Individual productivity ≠ Organizational productivity. The bottleneck moved—it didn’t disappear.

So What Are the Winners Doing Differently?

The top 5% who nearly doubled their throughput must have figured something out. I’m genuinely curious what separates them from the median performers.

Some hypotheses from my conversations with engineering leaders:

  • They redesigned their development processes for AI velocity (not just added AI to old processes)
  • They invested in validation and testing infrastructure in parallel with AI adoption
  • They changed how they measure success (quality + velocity, not just velocity)
  • They addressed coordination bottlenecks explicitly

But I’m sure there are other factors I’m missing.

For those of you who feel like AI has genuinely improved your team’s delivery velocity (not just individual productivity): What did you change organizationally? What metrics moved? What didn’t work?

For those in the 56% who haven’t seen measurable gains: What do you think is the blocker? Is it a measurement problem, an execution problem, or something else?

This feels like a critical inflection point. The gap between winners and everyone else is only going to widen from here.

This resonates deeply with what we’re experiencing in financial services. The productivity paradox you’re describing is real, and I think you’ve nailed the core issue: the bottleneck moved, it didn’t disappear.

At our organization, we’ve seen similar AI adoption rates—most of our engineers are using Copilot or Cursor daily. Individual velocity is definitely up. But organizationally? We hit a wall almost immediately, and the wall was validation.

Code Review Became the Chokepoint

Your observation about senior engineers “drowning in PRs” is exactly what happened to us. Our PR queue went from manageable to overwhelming in about 3 months. Junior and mid-level engineers were generating code 30-40% faster, but our tech leads and architects couldn’t review it any faster. In fact, review got slower because AI-generated code often requires more careful scrutiny—you can’t just pattern-match against conventions you know.

The CircleCI data about main branch success rates dropping to 70.8% confirms what we saw: AI accelerates code creation, but it doesn’t improve code quality by default. We were catching more bugs in review, seeing more failed CI runs, and spending more time on back-and-forth about implementation details.

The Fintech Quality Tax

In our domain, we can’t just ship faster if compliance and security reviews remain the gate. We learned this the hard way when our security team flagged AI-generated code that inadvertently introduced PII logging. Nothing made it to production, but it created a two-week delay while we audited everything.

That’s when we realized: if you optimize code creation without optimizing validation, you just created a faster way to pile up unshipped work.

What Actually Worked

We made two key investments in parallel with AI adoption:

  1. Automated testing infrastructure: We went hard on property-based testing, mutation testing, and automated security scanning. The goal was to catch issues before human review. This let our AI gains in test writing (which are genuinely impressive—closer to 90% faster for test generation) directly improve the system.

  2. Review tooling upgrades: We implemented AI-powered code review assistants (like Codium) and architectural fitness functions that automatically flag patterns we don’t want. This shifted senior engineers from “reviewing every line” to “reviewing flagged concerns.”

The real breakthrough was when we started using AI to help with the parts of development that are slow by nature—tests, refactoring, documentation—rather than just code generation. Those areas have fewer coordination bottlenecks.

The 97% Throughput Teams

I’d bet the top 5% who doubled their throughput aren’t just writing code faster. They likely redesigned their entire delivery pipeline to handle AI-velocity code. That includes:

  • Automated quality gates that can handle 2x the volume
  • Review processes that scale (async, AI-assisted, automated)
  • Architecture that supports rapid iteration without breaking things
  • Clear ownership boundaries so faster code doesn’t create coordination chaos

Question for the group: Has anyone successfully scaled their code review process to match AI writing speed? What did you change?

David, you’re asking the right question, but I think the answer is more fundamental than most organizations want to hear: the top 5% aren’t just using AI differently—they redesigned their entire development process.

I’ve spent the last 18 months watching companies try to bolt AI onto existing workflows, and it rarely works. The data you cited—90% of super productive workers reporting MORE coordination work—tells the story. When individuals accelerate but the system doesn’t, you don’t get 2x throughput. You get 2x friction.

The Process Redesign Reality

At my company, we tried the “add AI tools and let teams adapt” approach first. Exactly what you described happened: engineers felt productive, but delivery velocity plateaued. We were measuring the wrong things and optimizing for the wrong outcomes.

What changed everything was realizing that our approval checkpoints, sequential handoffs, and review cycles were all designed for a world where code took days to write. When code takes hours to write, those processes become the bottleneck instantly.

Here’s what we rewrote:

1. Approval thresholds: We raised the bar for what needs synchronous review vs. automated approval. Anything under 200 lines with passing tests and no security/compliance flags can merge with one async approval. Architectural changes and public API modifications get full committee review. We were treating everything like it needed full review.

2. Review SLAs: We set a 4-hour SLA for reviews under 200 lines, 24 hours for everything else. This sounds aggressive, but it’s actually necessary when engineers can generate a PR in 30 minutes. Without the SLA, review latency kills all velocity gains.

3. Deployment frequency targets: We went from deploying 3x/week to continuous deployment with automated rollback. When code is written faster, you need to ship it faster or the integration debt compounds.

4. Ownership boundaries: We restructured teams around service boundaries with clear APIs. This reduced cross-team coordination overhead—the biggest killer of AI productivity gains.

The Architecture Warning

Luis mentioned the quality tax, and he’s absolutely right. But there’s a deeper issue: AI will create technical debt faster than any human team ever could if you don’t have strong architectural guardrails.

We’ve seen:

  • Duplicated logic because AI doesn’t know what already exists elsewhere
  • Inconsistent patterns because AI learned from a messy codebase
  • Security vulnerabilities because AI optimizes for “works” not “secure”
  • Over-engineered solutions because AI tends toward complexity

The solution isn’t to slow down AI usage. It’s to invest in architectural fitness functions, automated documentation, and clear patterns that AI can learn from.

The teams seeing 97% throughput increases probably have:

  • Monorepo with clear conventions AI can follow
  • Comprehensive test coverage so broken code fails fast
  • Automated architectural compliance checking
  • Strong ownership boundaries that limit blast radius

The Uncomfortable Truth

Most organizations aren’t willing to do the process redesign work. They want AI to make the old way faster. But the old way was designed for different constraints.

The 56% of CEOs seeing “nothing” from AI investments? I’d bet most of them bought the tools but didn’t change how their teams work. The 12% seeing real ROI? They probably rewrote the playbook.

For those who have redesigned processes: What was the hardest thing to change? What got the most organizational resistance?

Coming at this from the product/design side, and that 90% stat about AI creating more coordination work absolutely resonates with what I’m seeing. :bullseye:

Engineering keeps telling me they’re moving faster. PRs are flying. Code is shipping. But when I look at feature delivery from kickoff to launch? Same timeline. Sometimes longer.

The Hidden Coordination Tax

Here’s what I think is happening: AI accelerated one part of the pipeline—code writing—but that just exposed all the other parts that were already slow. And in some cases, made them slower.

More PRs = more design review touchpoints. More iterations = more product alignment conversations. Faster code generation = more “wait, this isn’t what I meant” moments when the implementation doesn’t match the spec.

Michelle’s point about processes designed for slower work is spot-on. We’re still doing design review like code takes 2 weeks to write, when now it takes 2 days. That’s where the 90% coordination overhead comes from—the handoffs didn’t speed up.

What Helped: Design Systems + AI

One thing that’s worked for us: investing heavily in our design system documentation. Not just the component library, but the reasoning behind each component and pattern.

When AI can reference well-documented components with clear usage guidelines, it generates code that actually matches design intent. When it’s guessing from inconsistent examples, it generates code that “works” but looks wrong, feels wrong, or breaks accessibility standards.

We went from 60% of AI-generated UI code needing design revision to about 20%. The difference was documentation quality, not AI capability.

But Are We Optimizing The Wrong Part?

Luis mentioned using AI for tests and refactoring seeing 90% gains vs 20-30% for feature code. That tracks with what I’m seeing in design work too.

AI is amazing at:

  • Generating variants (responsive breakpoints, dark mode, etc.)
  • Accessibility auditing and ARIA attributes
  • Component documentation
  • Test cases and edge cases

AI is okay at:

  • Novel interaction design
  • Visual hierarchy and information architecture
  • Understanding user intent from requirements

Question: Are the top 5% teams maybe using AI for different tasks than the median teams? Like, are they optimizing the right 20% of the work instead of trying to accelerate everything?

I wonder if part of the paradox is that we’re measuring the wrong productivity gains. Individual engineers writing code faster is one metric. But maybe the real unlock is using AI for the coordination overhead itself—documentation, alignment, review—not just the code.

Curious what others are seeing on the product/design side of this. :thinking:

This entire thread is crystallizing something I’ve been wrestling with: the 56% of organizations seeing “nothing” from AI investments probably have a measurement problem as much as an execution problem.

David, you mentioned that only 12% of CEOs report AI both grew revenues AND reduced costs. I think that framing reveals the issue—we’re measuring AI impact using metrics designed for a different type of investment.

What Gets Measured Gets Optimized (Wrongly)

When we first deployed AI coding assistants across our 80-person engineering org, I made the classic mistake: I measured velocity. PRs per week, lines of code, cycle time. All of those went up. I reported success to the board.

Six months later, our CTO asked: “So where’s the product impact?” And I didn’t have a good answer. We were shipping more code, but not necessarily more value.

That’s when I realized: we were optimizing for speed when AI actually enables something more valuable—quality and experimentation.

The Measurement Shift That Worked

We completely rewrote our engineering OKRs:

Old metrics:

  • Ship X features per quarter
  • Reduce cycle time by Y%
  • Increase deployment frequency

New metrics:

  • Reduce production incidents by 40% while maintaining delivery velocity
  • Increase deployment frequency AND maintain <1% rollback rate
  • Improve developer satisfaction scores (our engineers were burning out)
  • Time to value (not just time to ship)

The results were dramatic. Once we stopped optimizing purely for speed, teams started using AI differently:

  • More time on architecture design and planning (AI helps with implementation, so front-load the thinking)
  • More aggressive refactoring (AI makes it less painful)
  • Higher test coverage (AI writes tests fast, so actually write them)
  • Better documentation (AI makes it less tedious)

Our deployment frequency went up 3x, our MTTR went down 40%, and our developer satisfaction scores improved significantly. That’s when we started seeing the ROI.

The Cultural Shift

Michelle mentioned process redesign, and that’s necessary but not sufficient. We also needed a cultural shift around what “productive” means in the AI era.

The 90% stat about super productive workers creating more coordination overhead? I think that’s what happens when individual IC optimization isn’t aligned with team optimization.

We explicitly changed our engineering values:

  • “Code quality over code speed” (AI gives you both if you use it right)
  • “Documentation is not optional” (AI removes the excuse)
  • “Continuous improvement over continuous delivery” (ship better, not just faster)

The hardest part? Helping senior engineers transition from “writing code” to “architecting systems and reviewing AI output.” That’s a skill shift, not just a tool shift.

Why The 12% See Real ROI

The companies seeing genuine ROI are probably measuring different things:

  • Time to market for MVPs (AI enables faster experimentation)
  • Technical debt reduction (AI makes refactoring economical)
  • Engineer retention (AI removes toil, increases satisfaction)
  • Quality metrics (defect rates, security vulnerabilities)

Not just:

  • Lines of code
  • PRs merged
  • Velocity points

For the leaders in this thread: What metrics did you change? What did you stop measuring? What did you start measuring?

I’m convinced the productivity paradox exists because we’re measuring 2010-era metrics in a 2026-era workflow. The winners are measuring different things.