CircleCI Reports 59% Throughput Increase, But Most Orgs Are 'Leaving Gains on the Table' Because Systems Haven't Caught Up With AI. What Needs to Change?

Just read CircleCI’s 2026 State of Software Delivery report, and one number jumped out: 59% year-over-year increase in daily workflow runs—the biggest throughput surge they’ve measured in seven years of publishing this report. At first glance, that’s incredible. AI-powered code generation is clearly working. We’re writing more code, faster than ever.

But here’s where it gets uncomfortable.

While the top 5% of teams nearly doubled their throughput (97% increase), the main branch success rate actually dropped to 70.8%—compared to an industry benchmark of 90%. We’re producing more, but we’re shipping less efficiently. The bottleneck has moved.

We’re Hitting the Organizational Ceiling

At our company, we’re living this paradox. We adopted AI coding assistants six months ago. Individual developer productivity metrics are up. PRs per engineer increased by about 40%. Everyone loves the tools.

But our delivery velocity? Essentially flat.

What’s happening is that the systems we built for human-paced development are breaking under AI-generated volume:

  • Code review queues are backing up. Senior engineers are spending 60%+ of their time reviewing instead of building. The volume overwhelmed our review capacity.
  • CI/CD pipelines are choking. Our build infrastructure was designed for 100 PRs a week. Now we’re handling 250-300. Queue times tripled.
  • Quality gates are creating new bottlenecks. Security scans, integration tests, compliance checks—all designed for lower throughput—are now the critical path.
  • Roles and responsibilities haven’t evolved. We’re still organized like we were in 2024. No one “owns” AI code quality. No clear process for reviewing AI-generated changes.

Waydev’s research captures this perfectly: “89% of organizations haven’t updated roles to reflect AI capabilities.” We’re using 2026 tools inside 2024 organizational structures.

The Data Gets Worse

The Cortex 2026 AI Benchmark found that AI-assisted code has 1.7× more issues and 23.7% more security vulnerabilities compared to human-written code. That’s not a tooling problem—that’s a process problem. Our review and testing infrastructure wasn’t designed for this kind of quality distribution.

And here’s the kicker from Workday’s research: Only 45% of organizations have formal AI usage policies, and companies are reinvesting AI savings back into more technology (39%) rather than employee development (30%). We’re compounding the problem.

What Actually Needs to Change?

I don’t think this is a temporary growing pain. I think we’re at an inflection point that requires systemic organizational redesign, not just process tweaks:

Infrastructure Investment:

  • CI/CD capacity needs to scale with AI-driven volume, not human baseline
  • Security and quality tooling needs to run faster, not just catch more issues
  • Observability for AI-generated code (who wrote it, which tool, what prompt context)

Process Redesign:

  • Code review standards that account for AI generation patterns
  • Quality gates that are async and parallel, not sequential blockers
  • Deployment processes that can handle higher frequency with equal safety

Organizational Structure:

  • New roles emerging: “AI code validators,” “integration architects,” “AI governance leads”
  • Senior engineers shifting from writing to designing/reviewing/architecting
  • Product and engineering realignment around what “done” means

Governance Frameworks:

  • Formal AI usage policies (we’re in the 55% without one—working to fix this)
  • Clear accountability for AI-generated code quality
  • Metrics that connect throughput to business outcomes, not just engineering activity

The Question

For those of you seeing similar patterns: What’s actually working? What changes have you made that translated individual AI gains into team-level delivery improvements?

And maybe more importantly: What are you trying that’s not working?

I suspect we’re all fumbling through this together. CircleCI’s data suggests most organizations are leaving the majority of AI productivity gains on the table. I’d rather learn from your experiments than repeat your mistakes.


Sources:

Michelle, this hits so close to home. We’re seeing the exact same pattern in our fintech engineering org.

The CircleCI data matches our experience almost exactly. We’re at about 55% increase in PR volume since rolling out AI assistants last quarter. And yes—our build queue times have tripled. Our CI/CD infrastructure was architected for steady-state human development. It’s buckling under the load.

The Infrastructure Debt Problem

Here’s what’s breaking for us:

Build Capacity:

  • Jenkins build agents that were sized for ~100 concurrent builds now seeing 250-300
  • Queue wait times went from <2 minutes to 15-20 minutes during peak hours
  • This is killing the fast feedback loop that made AI productive in the first place

Test Suite Duration:

  • Integration test suites designed to run sequentially are now the critical path
  • We’re looking at 45-minute test runs that used to be 20 minutes
  • The parallelization we need requires infrastructure investment we didn’t budget for

Security Scanning:

  • Static analysis tools (SonarQube, Snyk) choking on volume
  • Scans that took 3-5 minutes now taking 12-15 minutes
  • More false positives because AI code patterns differ from human baselines

The New Role: AI Code Governance

Your point about “no one owns AI code quality” is spot-on. We created an interim role—“AI Code Steward”—rotating weekly among senior engineers. Responsibilities:

  • Review AI-generated PRs with extra scrutiny for security/compliance
  • Maintain guidelines for what AI should and shouldn’t generate
  • Triage when AI-generated code fails review (fix it or rewrite it)

It’s… not scaling. The person in that role is underwater every week.

What We’re Trying

Short-term:

  • Doubled our build agent capacity (expensive, but necessary)
  • Implemented PR size limits—AI loves generating 2000-line PRs, we’re capping at 400
  • Created “AI-safe” vs “human-review-required” designation for certain code paths

Medium-term:

  • Investing in test parallelization and smarter test selection
  • Exploring AI-powered code review tools (using AI to review AI—meta but might work)
  • Redesigning our branching strategy to support higher merge frequency

The Question I’m Wrestling With

Who actually reviews AI-generated code at scale?

We can’t hire enough senior engineers to keep up with the review volume. Junior engineers don’t have the pattern recognition to catch AI mistakes. And the AI itself can’t be the reviewer (yet).

Are others creating specialized “code validator” roles? Upskilling mid-level engineers specifically for AI review? Or have you found tooling that actually reduces the human review burden without sacrificing quality?

The 70.8% main branch success rate you mentioned is a warning sign we should all be paying attention to. We’re moving faster individually but shipping less reliably collectively.

Both of you are identifying the technical bottlenecks, but I want to add the people and organizational dimension that’s equally critical.

That Workday stat Michelle cited—89% of organizations haven’t updated roles to reflect AI capabilities—is us. Completely us. And it’s creating real pain.

The Review Capacity Crisis

Luis, you asked who reviews AI code at scale. Here’s what we’re experiencing:

Senior engineers are drowning:

  • Our most experienced engineers now spending 65% of their time on code review vs 30% pre-AI
  • They’re frustrated—hired to design systems, now they’re quality gatekeepers
  • We’re seeing early signs of burnout and disengagement

The skill gap is real:

  • Mid-level engineers can review AI code, but they need different training
  • Pattern recognition for AI mistakes is different than human code review
  • We’re having to teach “what does good AI-generated code look like?”

Process breakdown:

  • Daily standups feel absurd—“What did you do yesterday?” Well, AI wrote 500 lines overnight
  • Sprint planning doesn’t account for AI velocity—we’re constantly re-estimating
  • Definition of “done” is unclear when AI generates working code that doesn’t match our standards

The Reinvestment Problem

Michelle’s point about companies reinvesting AI savings into technology (39%) vs employee development (30%) is the strategic mistake we’re making industry-wide.

We’re compounding the problem:

  • Saved 20 hours/week per engineer with AI assistance
  • Leadership response: “Great, let’s add more features to the roadmap”
  • Reality: We needed to invest those hours in upskilling, process redesign, governance

What we should be doing:

  • Training senior engineers to be effective AI code reviewers (it’s a different skill)
  • Upskilling mid-levels to take on more architectural responsibility
  • Building “AI literacy” across the organization—product, design, QA need to understand the new dynamics

What’s Actually Working (So Far)

Rotating “AI Code Review Specialist” role:

  • Similar to Luis’s “AI Code Steward” but we formalized it
  • Week-long rotation, compensated with reduced sprint commitments
  • Building institutional knowledge about AI code patterns
  • Still not scaling, but better than ad-hoc

Redefining senior engineer role:

  • Explicitly shifted expectations from “write code” to “design systems and validate implementations”
  • This is uncomfortable for engineers who built their identity around coding
  • But it’s the reality—AI does implementation, humans do architecture and quality

Process experiments:

  • Async code review—not blocking on immediate review, but with SLA
  • AI-assisted review summaries—using AI to flag potential issues for human review
  • Smaller, more frequent merges instead of large PRs

The Uncomfortable Question

Here’s what keeps me up at night: If AI is writing more code and humans are reviewing/validating, do we actually need fewer engineers?

Not asking that to be provocative. Asking because it’s what the CFO is going to ask when the next budget cycle comes around. And I need to have an answer that’s honest about:

  • How roles are changing (more validators/architects, fewer implementers)
  • What new skills we’re building (AI literacy, system design, quality validation)
  • Why headcount should stay flat or grow despite AI productivity

How are you all managing the transition without cratering team morale? Because right now, some of our senior engineers feel like they went from “building cool things” to “babysitting AI output,” and that’s not a sustainable culture.

Coming at this from the product/business side, and I have to say—this entire conversation is invisible to most executives and finance teams. That’s a huge problem.

The CFO Perspective

Michelle mentioned the Workday research showing 25% of AI investments deferred to 2027 pending ROI proof. I’m living that right now. Our CFO is asking:

“You said AI would make engineering 40% faster. Why are we shipping the same number of features per quarter?”

And honestly? I don’t have a good answer. Because the answer is:

  • “We’re generating 59% more PRs, but main branch success is down”
  • “We’re writing more code, but review is the bottleneck”
  • “Individual productivity is up, but team delivery is flat”

None of that translates to business value in CFO language.

The Measurement Gap

Here’s what finance sees:

  • :white_check_mark: AI tool costs: $50/engineer/month (clear line item)
  • :cross_mark: Productivity gains: “More PRs” (doesn’t connect to revenue)
  • :cross_mark: Infrastructure costs: CI/CD capacity doubled (unexpected expense)
  • :cross_mark: Team velocity: Same features shipped (expected improvement didn’t materialize)

The ROI story is currently negative. We spent money on tools, spent more money on infrastructure, and shipped the same amount of product.

That’s not sustainable politically or financially.

What We’re Missing: Connecting Engineering Metrics to Business Outcomes

The CircleCI data is fascinating to engineers. To finance, it’s meaningless. What would be meaningful:

Leading indicators we should be tracking:

  • Time from concept to production (is that actually faster?)
  • Customer value delivered per sprint (not story points, actual value)
  • Defect rate in production (is faster code better code?)
  • Cost per feature (are we more capital efficient?)

Lagging indicators that matter:

  • Revenue impact of faster iteration
  • Customer satisfaction with product velocity
  • Competitive advantage from faster feature delivery
  • Costs avoided from technical debt reduction (if we’re actually reducing it)

Right now, we’re celebrating activity metrics (more PRs, more builds) without proving outcome metrics (more revenue, happier customers, better margins).

The Stakeholder Confusion

Keisha, your point about the CFO questioning headcount is exactly what I’m worried about. Because here’s how the board is thinking:

“If AI makes engineers 40% more productive, we should need 40% fewer engineers, right?”

That’s wrong for all the reasons you outlined (different skills, validation vs building, etc.). But we haven’t made that case with data. We’re still using engineering-centric metrics that don’t translate to business stakeholders.

Framework We’re Building

Trying to bridge this gap with a tiered measurement framework:

Tier 1: Engineering Activity (Internal)

  • PR volume, commit frequency, build times, review velocity
  • Useful for engineering optimization, invisible to business

Tier 2: Engineering Outcomes (Cross-Functional)

  • Lead time for changes, deployment frequency, MTTR
  • Better, but still not business language

Tier 3: Business Impact (Executive/Board)

  • Features shipped per quarter (customer-visible)
  • Revenue enabled by engineering velocity
  • Customer satisfaction / NPS improvement
  • Engineering cost as % of revenue

The AI productivity gains need to show up in Tier 3 metrics or finance will stop funding them.

The Question for This Group

What metrics are others using to justify AI investment to non-engineering executives?

Because “59% more workflow runs” isn’t going to cut it in the next board meeting. I need to connect AI-driven engineering improvements to revenue growth, cost reduction, or competitive advantage.

And if we can’t make that connection, we’re going to see AI budgets cut in 2027—even if the technology is actually working.

Reading this thread from the design/quality perspective, and I’m honestly alarmed by something that’s getting lost in the infrastructure and metrics conversation:

That 23.7% more security vulnerabilities stat should be the headline, not the 59% throughput increase.

The Quality Crisis We’re Not Talking About

I lead our design systems, and here’s what I’m seeing that mirrors the engineering bottleneck:

AI generates tons of code, but:

  • Component usage is inconsistent (AI doesn’t always use the design system correctly)
  • Accessibility is an afterthought (AI-generated forms often fail WCAG compliance)
  • UX patterns don’t match our standards (works functionally, feels wrong to users)

The review problem is cross-functional:

  • Engineering reviews for logic/security
  • Design reviews for UX/consistency
  • QA reviews for functional correctness
  • Nobody reviews for the holistic “is this good quality?” question

We’re optimizing for speed without asking whether we’re building the right thing or building it well.

The Junior Developer Concern

Keisha mentioned the role shift, but there’s a scarier downstream effect:

Junior engineers are learning to review AI code, not to write code from scratch.

  • They’re pattern-matching against what AI generates
  • They’re not building the foundational skills of problem decomposition, algorithm design, edge case thinking
  • In 2-3 years, we’ll have mid-level engineers who can validate but not architect

This is a skills crisis in slow motion, and we’re not talking about it because we’re focused on today’s throughput numbers.

The Craft Conversation

Look, I get that this sounds nostalgic or elitist. “Oh, Maya wants everyone to hand-craft artisanal code.” That’s not what I’m saying.

I’m saying: Optimizing purely for velocity produces technical debt, security vulnerabilities, and UX inconsistency.

The CircleCI data showed main branch success rate dropping to 70.8%. That’s a quality signal. More code isn’t better if it doesn’t work, doesn’t ship, or ships with problems.

What I’m Trying (Design Systems as Guardrails)

Since we can’t slow down AI code generation, I’m treating our design system as constraints rather than guidelines:

Technical enforcement:

  • Linting rules that require design system components
  • Automated accessibility checks that block PRs
  • Visual regression testing that catches UX inconsistencies

AI prompting frameworks:

  • Templates for engineers to guide AI toward design system usage
  • “AI-safe” component documentation that’s optimized for LLM consumption
  • Pattern library that AI can reference

Design review SLA:

  • Treating design review like a required gate, not optional feedback
  • 24-hour SLA on design review for AI-generated UI changes
  • Escalation path when AI generates non-compliant code

The Philosophical Question

David asked how to connect AI productivity to business outcomes. Here’s my version:

Are we optimizing for output or outcomes?

  • Output: 59% more PRs, 97% more workflow runs for top teams
  • Outcome: Better products, happier customers, sustainable engineering culture

Right now, we’re measuring output and assuming it correlates to outcome. But the data suggests otherwise:

  • Main branch success rate down (quality problem)
  • 23.7% more security vulnerabilities (safety problem)
  • Senior engineers burning out on review (culture problem)
  • Junior engineers not learning fundamentals (skills problem)

The Question I’m Sitting With

How do we maintain craft, quality, and long-term thinking in an environment optimized for short-term velocity?

Because if the answer is “we don’t,” then we’re building a very fragile, very fast system that’s going to collapse under its own technical and cultural debt in 2-3 years.

And that’s not productivity. That’s just deferring the cost to future us.