AI Drove 59% Increase in Engineering Throughput, But Most Teams Leave Gains on the Table—What's Blocking the Value Capture?

I’ve been tracking our engineering metrics closely since we rolled out AI coding assistants last year, and the data is both exciting and frustrating. CircleCI’s 2026 State of Software Delivery report shows a 59% increase in average engineering throughput across 28 million data points. That’s massive. But here’s the uncomfortable truth: most organizations—including mine—are leaving the majority of those productivity gains on the table.

The Throughput Paradox

Our team’s experience mirrors what the research shows. We saw a 15.2% increase in throughput on feature branches—developers are clearly moving faster. They’re experimenting more, iterating quicker, and shipping more code to review. But when I looked at main branch throughput, we actually declined 6.8%.

That disconnect was a wake-up call.

The problem isn’t the AI tools. The problem is that everything downstream of code generation—pull request reviews, QA validation, security scanning, deployment approvals—was built for a different velocity. When coding accelerates, pull request volume increases, review queues grow, QA becomes saturated, and security validation lags. The entire delivery system needs to adapt, and most of us haven’t done that work yet.

The Measurement Blind Spot

According to Waydev’s 2026 analysis, the strategic question for CTOs and VPs of Engineering isn’t whether to adopt AI, but how to build the organizational visibility required to extract AI’s full value across the entire delivery cycle. The bottleneck has shifted, and most leaders haven’t updated their dashboards to reflect it.

Four metrics belong on every engineering leader’s dashboard right now:

  1. Main branch success rate - This is the clearest signal of whether your delivery system is keeping pace with AI-generated volume. The industry benchmark is 90%. The current average is 70.8%. We’re at 73%, which tells me we have work to do.

  2. PR review time - Research shows PR review time increased 91% even as coding speed improved. That’s your bottleneck screaming at you.

  3. Mean time to recovery (MTTR) - For teams running AI-assisted workflows, this is where productivity gains either hold or disappear.

  4. Deployment frequency - More code should mean more value delivered to customers. If your deployment cadence hasn’t changed, you’re not capturing the gains.

What Actually Works

Companies that are successfully capturing AI productivity gains aren’t just tracking different metrics—they’re fundamentally rethinking their delivery systems:

  • Dropbox tracks daily and weekly active AI users, AI tool satisfaction, time saved per engineer, and spend. They’re connecting AI adoption to business outcomes, not just developer satisfaction.

  • Leading teams are investing in hardened security templates, automated validation, and pre-configured guardrails so that security doesn’t become the bottleneck.

  • The most effective organizations are strengthening review practices and retraining teams to understand that AI-generated code still requires the same rigor as human-written code.

The Uncomfortable Questions

Here’s what I’m wrestling with:

  1. Are we measuring the right things? Traditional metrics like story points and velocity don’t capture where AI creates value or where delivery systems break down.

  2. Have we adapted our processes? If developers can write code 30% faster but we’re still shipping at the same pace, the problem isn’t the developers.

  3. What’s the ROI? CFOs are starting to ask hard questions about AI tool spend. Without clear metrics linking AI adoption to delivery outcomes or business results, those budgets are at risk.

  4. Are we building the right things faster, or just building faster? Throughput gains only matter if we’re solving the right problems for customers.

What I’m Curious About

I’d love to hear from other engineering leaders about:

  • What metrics are you tracking to measure AI’s impact on delivery, not just coding?
  • Where have you found the bottlenecks when AI accelerates coding?
  • How are you adapting your review, QA, and deployment processes to match the new velocity?
  • What does “value capture” actually mean for your organization—is it faster releases, better quality, cost savings, or something else?

The 59% throughput increase is real. The question is whether we’re building the organizational systems to actually realize that value.


Sources:

This resonates deeply with what we’re experiencing at our fintech company. The measurement gap you’re describing is real—and honestly, it caught us off guard.

We rolled out GitHub Copilot and Cursor across our engineering org about 8 months ago. Within weeks, developers were clearly happier and shipping more PRs. Our initial metrics looked great: 60% more pull requests merged per sprint, developers self-reporting 3-4 hours saved per week, positive sentiment in our engineering surveys.

But our release velocity didn’t change. At all.

The Security Bottleneck We Discovered

When we dug into it, we found our bottleneck exactly where you described: security validation couldn’t keep pace with the volume. In financial services, every code change requires security review before it hits production. Our security team was designed to handle maybe 40-50 PRs per week. Suddenly they were getting 80-90.

The queue time for security review went from 1-2 days to 5-7 days. Our MTTR metric—which you correctly identified as critical—actually got worse because when incidents happened, the fix would sit in security review for days.

What We Did About It

We’ve made some progress, though we’re still learning:

  1. Pre-configured security templates - We invested in hardening our service scaffolding with OWASP controls baked in. New services start secure-by-default.

  2. Automated security scanning - Shifted left with static analysis tools (Snyk, Semgrep) that run in CI/CD. If it passes automated checks, security team only spot-checks critical paths.

  3. AI-generated code flagging - This is controversial, but we tag PRs that contain significant AI-generated code for extra scrutiny. Early data showed higher rates of subtle bugs—input validation issues, edge cases missed, that kind of thing.

The Cultural Challenge

But here’s what’s harder than the tooling: developers need to understand that AI-generated code still requires the same rigor as code they write by hand.

We’ve had incidents where engineers merged AI-suggested code that “looked right” but had subtle security flaws or performance issues. The speed of AI generation creates a false sense of completion. Just because the code was written fast doesn’t mean it was written correctly.

We’re retraining our teams to slow down on the review side even as AI speeds up the generation side. That’s a mindset shift.

Still Wrestling With Measurement

Your question about what metrics actually matter is the one I’m still figuring out. We’re tracking:

  • PR cycle time by stage (coding, review, security, deployment) to find bottlenecks
  • Deployment frequency (still haven’t improved this enough)
  • Change failure rate (watching for AI-related quality issues)
  • Time-to-remediation for security findings

But I’m not confident these are the right metrics yet. They’re lagging indicators. I’d love to hear from others about leading indicators that predict whether AI productivity will actually translate to business outcomes.

Question for the group: How are others balancing speed with quality gates? Are you seeing higher defect rates from AI-generated code, or is that just a training issue we need to solve?

Okay, I’m going to challenge this from a slightly different angle—and maybe this is my failed-startup trauma talking, but hear me out.

Are we optimizing for the wrong outcome?

The “Build the Wrong Thing Faster” Risk

Michelle, you asked: “Are we building the right things faster, or just building faster?” That question keeps me up at night.

My startup failed because we shipped features FAST. We had a tight, talented engineering team. We iterated quickly. We built everything our early customers asked for. And we still failed—because we were solving problems that didn’t actually matter to a big enough market.

AI coding makes that failure mode even more dangerous. If developers can write code 59% faster, but we haven’t invested equally in customer discovery, product validation, and problem-solution fit… we’re just going to ship the wrong solutions 59% faster.

Speed Without Direction Is Just Motion

Luis mentioned the security bottleneck—that’s real and important. But I’d argue there’s an earlier, more fundamental bottleneck: product strategy and problem validation.

If engineering throughput increases 59%, shouldn’t we also be investing in:

  • 59% better customer research?
  • 59% more rigorous problem validation?
  • 59% stronger prioritization frameworks?

Because if we’re not, we’re just filling the roadmap with features that might not move the needle.

What I Wish We’d Measured at My Startup

Instead of celebrating “features shipped” or “velocity,” I wish we’d tracked:

  • Problem validation rate - How many feature ideas did we kill BEFORE writing code?
  • Feature adoption - What percentage of shipped features actually got used?
  • Customer outcome metrics - Did the feature solve the customer’s problem?
  • Revenue per feature - Which features actually drove business results?

We had great developer throughput. We had terrible product-market fit. AI would have made that problem worse, not better.

A Design Perspective

From a design systems perspective, I see this pattern a lot: teams optimize for building components fast, but they don’t invest in the research to understand what components users actually need.

AI can generate a perfectly functional date-picker component in minutes. But if users don’t need another date-picker—if they need better date-range selection, or if the whole flow should be redesigned to not require date entry at all—then the speed of generation is irrelevant.

User value doesn’t scale with code velocity unless product strategy scales too.

So What Should We Measure?

Michelle’s four metrics are great for measuring delivery system health. But I’d add:

  • Discovery-to-delivery ratio - How much time do we spend validating problems vs building solutions?
  • Feature adoption rate - What percentage of shipped features get meaningful usage within 30 days?
  • Experiment velocity - Can we test assumptions FASTER because coding is faster?

If AI makes coding faster, the leverage is in using that speed to run MORE experiments, validate MORE hypotheses, and kill bad ideas EARLIER—not just in shipping more features.

Question: Is anyone pairing AI coding gains with stronger product discovery practices? Or are we mostly focused on the delivery side?

(Sorry if this is a bit raw—still working through my startup lessons. But I think speed amplifies both good strategy and bad strategy.)

Maya, you’re hitting on something that I’ve been trying to articulate to our executive team for months: more features ≠ more revenue.

And you’re absolutely right that AI coding velocity makes this problem more acute, not less.

The Product Metrics Gap

Michelle’s original post nails the delivery system metrics. Luis added the security angle. Maya just dropped the product strategy challenge. I want to connect these to the business outcomes that our CFOs and CEOs actually care about.

Because here’s what I’m seeing at our Series B fintech startup: we’re shipping more features than ever. Our engineering team is crushing it. But our customer adoption metrics haven’t moved. Revenue per customer is flat. Churn is unchanged. NPS scores are the same.

We’re not capturing the productivity gains because we haven’t solved the prioritization and validation problem.

What Actually Predicts Success

At my time at Google and Airbnb, the highest-performing product teams had one thing in common: they were ruthless about saying “no.”

They killed feature ideas early. They ran cheap validation experiments before committing engineering resources. They measured outcomes (user behavior change, revenue impact) not outputs (features shipped).

AI coding velocity should amplify that discipline. If we can build 59% faster, we should be able to:

  1. Test more hypotheses with lightweight prototypes
  2. Kill bad ideas 59% faster when validation fails
  3. Double down 59% faster when we find product-market fit

But that only works if product strategy evolves at the same pace as engineering throughput.

The Framework I’m Using

I’ve started pushing our team to shift from output metrics to outcome metrics:

Old metrics (output-focused):

  • Features shipped per quarter
  • Story points completed
  • Release frequency

New metrics (outcome-focused):

  • Revenue per feature (which features actually drive $$?)
  • Feature adoption rate within 30 days (are customers using what we built?)
  • Time-to-validated-learning (how fast can we prove/disprove a hypothesis?)
  • Customer outcome achievement (did we solve their problem?)

The Hard Conversation with Engineering

Here’s the uncomfortable part: if engineering can do 59% more, product needs to be 59% better at prioritization.

That means:

  • Stronger discovery practices (customer interviews, usage data analysis, competitive research)
  • Better validation frameworks (A/B tests, beta programs, feature flags)
  • Clearer success metrics BEFORE we write code
  • More discipline to kill features that don’t hit success criteria

If we’re not investing in product capability at the same rate we’re investing in engineering velocity, we’re just going to build a lot of features nobody wants.

Revenue Per PR - A Controversial Metric?

I floated this idea internally: what if we tracked revenue per pull request?

It sounds absurd, but think about it. If we’re merging 60% more PRs (like Luis mentioned) but revenue is flat, that metric screams “we have a prioritization problem.”

Obviously not every PR directly drives revenue. Infrastructure work matters. Tech debt reduction matters. But as a forcing function to connect engineering velocity to business outcomes, it’s provocative.

Question for Engineering Leaders

Michelle, Luis, Maya—when you’re tracking AI productivity gains, are you connecting them to business metrics?

  • Are you tracking which AI-accelerated features actually moved revenue, retention, or customer satisfaction?
  • How are you helping your product teams keep pace with the increased engineering capacity?
  • What does “good prioritization” look like when engineering can do twice as much?

Maya’s right: speed amplifies good strategy and bad strategy. The question is whether we’re building the product muscle to make sure it’s amplifying the good stuff.

This thread is exactly the conversation we need to be having. Michelle started with the measurement blind spot, Luis brought the security reality, Maya challenged us on product validation, and David connected it to business outcomes.

I want to add the organizational and cultural dimension—because I think the biggest blocker to value capture isn’t technical. It’s human.

The Bottleneck Is Organizational, Not Technical

At our EdTech startup, we saw the same pattern everyone’s describing: AI made coding faster, but we weren’t shipping faster. When we dug into it, the bottleneck wasn’t our CI/CD pipeline or our security tools or even our product prioritization.

The bottleneck was that our processes and culture were designed for monthly releases, but AI gave us the capacity for weekly releases.

Our approval workflows assumed scarce engineering capacity. Our review practices assumed we had time for deep, synchronous code review. Our deployment cadence assumed stability windows and planned maintenance.

AI broke all those assumptions—and exposed how much of our “process” was actually friction.

The Cultural Challenge No One Talks About

Here’s the hard part: senior engineers felt threatened by AI velocity.

Not in the “AI will take my job” way. In the “I no longer have time to do deep, thoughtful code review” way.

Before AI: A senior engineer might review 10-15 PRs per week. They could spend 20-30 minutes per review, really understanding the code, suggesting improvements, mentoring junior developers through feedback.

After AI: That same engineer is getting 25-30 PRs per week. They can’t maintain the same review depth. They feel like they’re rubber-stamping code. They’re worried about quality slipping. They’re stressed.

And they’re not wrong to be worried.

What We Had to Change

We couldn’t just tell people to “review faster.” We had to redesign the system:

1. Invested in Automated Quality Gates

  • Comprehensive test coverage requirements (80%+ for new code)
  • Automated security scanning (Snyk, SonarQube)
  • Performance regression testing in CI
  • Architectural linting (enforce design patterns)

This shifted the review burden from humans to automation for the things automation is good at.

2. Redesigned Code Review for AI Era

  • Pairing AI-generated code with junior developers + senior review (teaching moment)
  • Focused senior engineer time on architectural decisions, not syntax
  • Created “AI code” review checklist (edge cases, error handling, performance)
  • Required test coverage BEFORE code review, not after

This preserved senior engineer expertise while adapting to higher volume.

3. Embraced Continuous Delivery

  • Moved from monthly releases to daily deploys with feature flags
  • Invested heavily in observability (monitoring, alerting, tracing)
  • Built rollback capabilities so we could move fast and recover fast
  • Implemented gradual rollout (1% → 10% → 50% → 100%)

This changed the risk profile so we could deploy faster without increasing blast radius.

4. Addressed the Cultural Resistance Directly

  • Held retrospectives about AI impact on workflow
  • Validated senior engineers’ concerns (they were real)
  • Co-designed new processes WITH the team, not TO the team
  • Celebrated recoveries, not just deployments (destigmatized failure)

This built psychological safety for moving faster.

The Metric That Changed Everything

The metric that finally aligned our team: Mean Time to Customer Value (MTTCV).

We started measuring time from “customer problem identified” → “solution validated” → “code deployed” → “customer confirms value.”

That metric exposed EVERYTHING:

  • Slow discovery processes (Maya’s point about validation)
  • Security bottlenecks (Luis’s point)
  • Prioritization problems (David’s point)
  • Cultural resistance to fast deployment (my point)

And it gave us a shared goal that everyone—engineering, product, security, leadership—could rally around.

AI Forced Us to Confront Process Debt

Here’s what I’ve realized: AI coding velocity forces you to confront technical debt AND process debt simultaneously.

You can’t just make coding faster. You have to make the entire value stream faster:

  • Customer discovery → Product validation → Engineering → Security → QA → Deployment → Measurement

If any one of those steps doesn’t scale, you won’t capture the gains.

Questions for Other Leaders

I’m curious how other engineering leaders are handling the people side:

  • How are you helping senior engineers adapt to higher review volumes without burning out?
  • What does “good engineering culture” look like in an AI-accelerated workflow?
  • How are you building psychological safety to move fast when AI enables higher velocity?
  • What organizational changes have you made beyond just tooling?

Michelle’s right that the 59% throughput increase is real. But capturing that value requires redesigning how we work—not just how we code.

And that’s a leadership challenge, not a technical one.