"More Code, Fewer Releases": Throughput Up 59% But Deployment Frequency Flat—Are We Optimizing for Activity Over Outcomes?

I need to talk about something that’s been bothering me for the past 6 months—and after digging into the data this weekend, I’m convinced we have a massive leadership blind spot that nobody’s addressing.

The Numbers Don’t Make Sense

Our team’s throughput metrics look incredible:

  • Pull requests up 98% compared to Q4 2025 (before AI coding assistants rolled out)
  • Code commits up 59% across the org
  • Individual developer velocity up 40-60% based on self-reported surveys

Sounds amazing, right? Except when I look at what actually matters:

  • Deployment frequency: basically flat (only up 8% despite all that extra code)
  • Lead time for changes: up 12% (it’s taking longer to ship features)
  • Customer-facing releases: down 3% quarter over quarter

We’re generating mountains of code but shipping less value to customers. How is that even possible?

What I Think Is Happening

After reviewing 3 months of engineering data and talking to 15+ engineers across teams, here’s my theory:

We’re optimizing for activity, not outcomes.

The AI coding assistants (GitHub Copilot, Cursor, etc.) are making it incredibly easy to write code. Junior engineers who used to spend 6-8 weeks ramping up are now productive in 3-4 weeks. Mid-level engineers are cranking out features at senior-level speed.

But all that code has to be reviewed. And our review processes haven’t scaled with the code volume.

According to recent industry data, PR review times are up 91% when AI is heavily involved. AI-generated PRs wait 4.6x longer for review pickup and have only a 32.7% acceptance rate vs 84.4% for human-written PRs.

So we’ve just moved the bottleneck from writing to reviewing.

The Uncomfortable Questions

This raises some questions I don’t have answers to yet:

1. Are we measuring the wrong things?

Our dashboards track lines of code, commit velocity, and PR throughput. But these are activity metrics. They measure effort, not impact.

Should we be tracking:

  • Features shipped to production?
  • Customer value delivered per sprint?
  • Time from idea → customer impact?
  • Main branch success rate? (Industry benchmark is 90%, we’re at 72%)

2. Are we creating a productivity theater?

When engineers know they’re measured on PRs merged and commits pushed, they start gaming the system:

  • Splitting features into micro-PRs to hit velocity targets
  • Shipping incomplete features just to close tickets
  • Generating code because the AI makes it easy, not because it’s necessary

This is Goodhart’s Law in action: “When a measure becomes a target, it ceases to be a good measure.”

3. What happens when the review bottleneck breaks?

Right now, senior engineers are doing 4-6 hours/week of additional code review to handle the AI-generated volume. They’re burning out.

If we don’t fix this, one of three things happens:

  • Senior engineers start rubber-stamping reviews (quality drops)
  • Senior engineers quit (brain drain)
  • We hire more reviewers (expensive, doesn’t scale)

None of these are good outcomes.

What Actually Matters in 2026?

I keep coming back to this article: “More Code, Fewer Releases: The Engineering Leadership Blind Spot of 2026”. The core insight is that most engineering leaders haven’t updated their dashboards to reflect where bottlenecks have actually shifted.

AI hasn’t made us more productive—it’s just moved the constraint.

The real question is: are we building the right things, and are we building them well?

Not: “How many lines of code did we write this week?”

Looking for Perspectives

I’m curious how other engineering leaders are thinking about this:

  • What metrics are you actually tracking to measure engineering effectiveness in the AI era?
  • How are you handling the review bottleneck when AI is generating 40-60% of your team’s code?
  • Have you seen deployment frequency decouple from code volume like we have?
  • What does “productivity” even mean when AI can write code faster than we can validate it’s correct?

I don’t have this figured out yet. But I’m pretty sure optimizing for code commits in 2026 is like optimizing for email volume in 2015—you’re measuring activity, not accomplishment.

Would love to hear how others are navigating this.

This is hitting way too close to home. We’re 9 months into our AI coding adoption journey at our 120-person org, and I’m seeing exactly the same pattern—but from the executive level, which adds another layer of complexity.

The Board Question I Can’t Answer

Last month our board asked: “You said AI would make engineering 40% more productive. Why is the product roadmap still slipping?”

I showed them the metrics:

  • PRs merged: +98%
  • Deployment velocity: +8%
  • Features shipped to customers: -2%

The CFO said: “So AI made us better at generating work, not delivering value?”

Ouch. But he’s not wrong.

What I’ve Learned: AI Shifts Where the Work Lives

Your point about moving the bottleneck from writing to reviewing is exactly right, but I think there’s more to it.

We’re not just moving bottlenecks—we’re creating new categories of work that didn’t exist before:

  1. AI code review (not just regular review—you need to check for patterns AI gets wrong)
  2. Comprehension debt (code that works but nobody understands why)
  3. Integration work (AI is great at isolated features, terrible at system thinking)
  4. Rework (teams exceeding 40% AI generation face 20-25% increase in rework rates according to recent benchmarks)

We’re losing 7 hours per team member weekly to AI-related inefficiencies according to our time tracking.

The Metrics That Actually Matter

You asked what metrics we’re tracking. After 3 painful quarters, here’s what we landed on:

Old Metrics (Still Track, Less Weight)

  • Deployment frequency
  • Lead time for changes
  • PR velocity

New Metrics (Higher Signal)

  • Main branch success rate (our North Star—went from 68% to 83% after we started measuring it)
  • Feature validation rate (% of features that meet success criteria 30 days post-launch)
  • AI code ratio (% of codebase authored by AI—we cap teams at 35% now)
  • Review burden hours (senior engineer time spent on AI code review)
  • Comprehension score (can 2+ engineers explain how a feature works?)

The comprehension score is subjective but incredibly valuable. If AI wrote it and only one person understands it, that’s a liability.

The Uncomfortable Solution

Here’s what we did that actually moved the needle:

We told teams to slow down.

I know that sounds insane when you’ve just invested in AI tooling to go faster. But we implemented:

  1. 35% AI cap per sprint (measured by file-level attribution)
  2. Mandatory “AI archaeology” sessions every 2 weeks where team reviews AI-generated code they don’t understand
  3. Paired review for >40% AI PRs (two reviewers required)
  4. Feature freeze Fridays (no new features—refactor, document, or pay down debt)

Result after 2 months:

  • Deployment frequency: down 12% (expected)
  • Customer feature deliveries: up 18% (unexpected!)
  • Main branch success rate: up 15 points
  • Senior engineer burnout: significantly reduced (qualitative but visible)

We’re shipping less code but more value. Which is the whole point.

The Question for Leadership

Luis, your question “What does productivity even mean?” is the right one.

I think in 2026, productivity ≠ velocity. Productivity = delivering customer value per unit of organizational energy spent.

If AI lets you write code 60% faster but creates 40% more debt and burns out your senior engineers, you’re not more productive—you’ve just financialized technical debt.

The hard part is convincing boards and executives who are measuring us on output, not outcomes. That’s the real leadership challenge.

This thread is fascinating because you’re describing the exact same pattern I’m seeing from the product side—and it’s revealing a massive disconnect between engineering velocity and product outcomes.

The Product Perspective: Fast Code ≠ Fast Learning

Engineering told me 6 weeks ago: “We can ship 40% more features now that we have AI coding assistants.”

Great! So I prioritized an ambitious roadmap with 12 new features for Q1.

Result: We shipped 11 of them. But only 4 met their success criteria. 3 are being deprecated next quarter.

We shipped faster but we didn’t learn faster. And in product, learning is the whole game.

Where the Bottleneck Actually Is

Here’s what I think is happening:

You’re right that AI moved the bottleneck from writing → reviewing. But from a product perspective, the bottleneck isn’t in engineering at all anymore.

The bottleneck is in product discovery.

Our eng team can now build a feature in 2 weeks that used to take 6 weeks. Amazing!

Except:

  • Customer interviews still take 3 weeks
  • Prototype testing still takes 2 weeks
  • Beta validation still takes 4 weeks
  • Post-launch analysis still takes 2 weeks

So we’re shipping features before we’ve validated they solve the right problem. We’re optimizing for delivery speed, not learning velocity.

The Measurement Mismatch

You asked what metrics matter. From product:

Engineering is measured on:

  • Features shipped
  • Story points completed
  • Deployment frequency

Product should be measured on:

  • % features that meet success criteria
  • Time from insight → validated learning
  • Customer problem resolution rate

But nobody tracks that last set systematically.

The Inconvenient Truth

I looked at our feature success rate over the past year:

  • Q1 2025 (pre-AI): 14 features shipped, 9 successful = 64% success rate
  • Q4 2025 (early AI): 18 features shipped, 10 successful = 56% success rate
  • Q1 2026 (mature AI): 23 features shipped, 11 successful = 48% success rate

We’re shipping 64% more features but our success rate dropped 16 points. We’re building faster but choosing worse.

Why? Because the feature factory is running so fast that product doesn’t have time to validate whether we’re building the right things.

What Actually Works: Slowdown Product, Speed Up Iteration

Here’s our experiment for Q2:

We’re shipping fewer new features (target: 12 instead of 23) but iterating much faster on each one:

  1. Ship v0.1 to 5% of users in week 1
  2. Analyze data, ship v0.2 in week 2
  3. Expand to 25% in week 3
  4. Full launch only if metrics hit targets

AI’s speed advantage should be in iteration cycles, not feature count.

Instead of:

  • 6 weeks to ship Feature A perfectly

Do:

  • Week 1: Ship Feature A v0.1 (AI-assisted, fast)
  • Week 2: Ship Feature A v0.2 based on real data (AI-assisted, fast)
  • Week 3: Ship Feature A v0.3 or kill it

Optimize for learning velocity, not shipping velocity.

The Strategic Implication

Luis, you said: “Are we building the right things, and are we building them well?”

I’d add a third question: “Do we know whether what we built actually worked?”

Because if engineering can ship 3x faster but product validation takes the same time, we’re just accumulating a backlog of unvalidated experiments.

That’s not productivity. That’s inventory.

Reading this thread, I keep thinking about the human cost of this productivity paradox that nobody’s really talking about.

Yes, the metrics are misaligned. Yes, we’re optimizing for activity over outcomes. But there’s a deeper organizational problem here that goes beyond dashboards.

The People Side: Who Pays for the Velocity Gains?

At our 80-person EdTech company, we rolled out AI coding assistants 8 months ago. The productivity numbers looked great:

  • Deployment frequency: +42%
  • Features shipped: +38%
  • Initial velocity: +60% for teams using AI heavily

But when I looked at the organizational health metrics:

  • Senior engineer burnout indicators: 4 out of 12 showing symptoms
  • Code review queue: +67% in average backlog
  • Junior engineer confidence: -28% based on quarterly surveys
  • Team cohesion scores: lower on AI-heavy teams than human-heavy teams

The velocity gains are being paid for by senior engineers burning out and junior engineers losing learning opportunities.

The Review Bottleneck Is a People Problem

Luis, you asked how we’re handling the review bottleneck. The honest answer: we’re not handling it well, and it’s creating a people crisis.

Here’s what’s happening:

Our teams are generating 42% more PRs than last year. Review times are up 67%. We have the same number of senior engineers.

Do the math:

  • 22-25 hours/week on code review (up from 15 hours)
  • Leaving 10-15 hours for actual engineering work

Three senior engineers have explicitly told me they’re considering stepping back from senior roles because they’ve become “full-time code reviewers.”

That’s a problem. We can’t scale review capacity by hiring more seniors (expensive, slow). We can’t reduce standards (quality death spiral). And we can’t keep burning out our best people.

The Mentorship Crisis Nobody’s Tracking

There’s a second-order effect that’s even more concerning:

Junior engineers are shipping code at mid-level speeds thanks to AI. But they’re not learning the architectural thinking that makes someone a senior engineer.

Traditional model:

  • Junior writes code slowly, makes mistakes, gets detailed feedback
  • Learns why certain patterns work, develops judgment
  • Becomes mid-level in 2-3 years, senior in 5-7 years

AI-accelerated model:

  • Junior asks AI to write code, ships it quickly
  • Gets feedback on output, not understanding
  • Develops surface competence but not deep expertise

We’re creating a two-tier engineering workforce:

  1. People who understand why (pre-AI trained seniors)
  2. People who ship fast but don’t understand why (AI-native juniors)

In 3-5 years when our current seniors retire or burn out, who’s going to be able to do the architectural thinking and complex debugging?

The Inclusion Angle That’s Getting Worse

There’s an uncomfortable equity dimension here too:

The engineers who are thriving with AI tools tend to be:

  • Already confident (seniors who know what to ask for)
  • English-fluent (AI works better in English)
  • Familiar with US tech patterns (AI trained on US codebases)

The engineers struggling:

  • Juniors who don’t know what they don’t know
  • Non-native English speakers (asking AI the wrong questions)
  • People trained in different coding paradigms

AI is accidentally amplifying existing knowledge gaps instead of closing them.

What We’re Trying: Redefine “Senior” Work

We implemented a controversial change last month:

Senior engineers spend 30% of their time reviewing AI-generated code. This is in their job description and in their promotion criteria.

We also created “AI archaeology” sessions (borrowing Michelle’s term above) where the team collectively examines AI-generated code and explains:

  • What it does
  • Why it works (or doesn’t)
  • What edge cases might break it
  • How to maintain it

This serves two purposes:

  1. Catches AI errors before production
  2. Turns review into teaching moments for juniors

Early results (2 months in):

  • Review quality: up
  • Junior learning: qualitatively better
  • Senior burnout: not solved, but at least acknowledged

The Uncomfortable Question for Leadership

Here’s what keeps me up at night:

If AI-generated code requires 70% more review time and creates 40% more comprehension debt, are we actually more productive? Or are we just shifting the work to people we’re not measuring?

I think we’re optimizing for Q1 2026 velocity at the expense of:

  • Senior engineer retention (people cost)
  • Junior engineer development (future cost)
  • Team cohesion (cultural cost)
  • Organizational learning (competitive cost)

Those costs don’t show up in deployment frequency dashboards. But they’ll show up in 18-24 months when we can’t ship because we’ve burned through our senior talent and failed to develop the next generation.

Productivity that destroys your team’s capacity to be productive next year isn’t productivity—it’s technical debt dressed up as velocity gains.