We predicted 25% AI productivity gains. Reality delivered 30-50%. Why aren't we talking about this?

I’ve been tracking our engineering team’s AI adoption for the past year, and something fascinating emerged from our data: we consistently underestimated the impact.

When we first adopted AI coding tools in Q1 2025, our conservative internal forecast projected 20-25% productivity gains for routine tasks. Leadership was skeptical—some thought even that was optimistic. Fast forward to today: our actual measurements show 30-50% improvements in scoped tasks like test generation, refactoring, and boilerplate code.

The Numbers That Changed My Mind

McKinsey’s February 2026 study across 4,500 developers found a 46% reduction in time spent on routine coding tasks. That’s nearly double what most organizations predicted when they started their AI journey.

Yet here’s what bothers me: despite these gains, our organizational delivery velocity improved only 8-12%. That’s the gap that keeps me up at night.

What’s Eating 38% of Our Productivity Gains?

The bottlenecks migrated downstream:

  • Code review queues grew 40% longer (more PRs, same reviewers)
  • Security findings increased 1.7× (AI-generated code needs careful oversight)
  • Integration testing became the new constraint
  • Knowledge transfer suffered (junior devs copy-paste without understanding)

We saved time coding but spent it elsewhere. The system absorbed the gains.

The Harder Question

If we’re getting better results than we predicted at the individual level but seeing diminished returns at the organizational level, are we measuring the wrong things? Or are we simply not redesigning our processes to capture AI-era productivity?

I’m seeing 84% adoption across the industry but hearing surprisingly few conversations about organizational adaptation. Everyone’s focused on which tool to choose—Copilot vs Cursor vs Claude Code—but fewer teams are asking: “How do we redesign code review, testing, and integration when AI writes 40% of our code?”

My Current Hypothesis

AI coding tools are working better than expected, but we’re running them through organizational pipelines designed for human-only workflows. It’s like buying a Tesla and driving it on dirt roads—you get some benefit, but you’re not unlocking the full potential.

What are you seeing at your organizations? Are you hitting similar bottlenecks, or have you found ways to translate individual gains into team velocity?


Research sources: McKinsey AI Code Study 2026, Developer Productivity Statistics 2026, AI Coding Assistant Statistics

Michelle, this resonates deeply with what we’re experiencing in financial services. Your 38% productivity “leakage” number is eerily close to ours—we’re seeing 32-40% of gains disappear into what I call the “organizational friction layer.”

What Changed When We Stopped Fighting It

For the first 6 months, we tried to “fix” our processes to capture more gains. Faster code reviews, automated security scans, parallel integration testing. We got incrementally better but hit diminishing returns fast.

Then we reframed: what if 10-15% organizational improvement IS the right number when you account for necessary governance in regulated industries?

In fintech, that 1.7× increase in security findings you mentioned isn’t overhead—it’s catching vulnerabilities before they become compliance violations. The “lost” productivity is actually risk mitigation that we weren’t doing well before.

The Junior Developer Question

Your knowledge transfer point hits hard. We’re seeing junior engineers complete tickets faster but struggle to explain their solutions in design reviews. They’ve become effective at directing AI tools but less effective at understanding systems.

My controversial take: Maybe 30-50% task-level gains with 10-15% org-level improvements is the correct equilibrium when you factor in learning, governance, and institutional knowledge. We’re not leaving money on the table—we’re investing it in sustainability.

Are we optimizing for the wrong metric if we chase individual velocity at the expense of team capability?

Coming at this from the design/product side, and honestly? I think you’re both describing the exact same pattern we saw with design systems adoption 5 years ago.

The Design Systems Parallel

When we built our first component library:

  • Designers created mockups 40-60% faster (individual productivity :chart_increasing:)
  • Time-to-market for features improved only 15% (organizational velocity :chart_decreasing:)
  • Why? Engineers now spent time customizing components instead of building from scratch

We weren’t “losing” productivity—we shifted where the work happened. The bottleneck moved from creation to integration and customization.

What Actually Worked

We stopped measuring “time to create design” and started measuring “time to validated user experience.” Completely different metric, completely different optimization strategy.

For AI coding, maybe the question isn’t “how do we capture lost gains?” but “what are we actually trying to optimize?”

If the goal is shipping reliable, maintainable code that junior devs understand and senior devs can review efficiently, then 10-15% improvement might be exactly right when you include all the invisible work: understanding, reviewing, teaching, maintaining.

The Uncomfortable Possibility

What if AI coding tools are working perfectly and the bottleneck was never writing code in the first place? What if it was always communication, understanding, and coordination—and now that’s just more visible?

Michelle, your “dirt roads” analogy is perfect, but maybe we need to ask: are we building highways, or do we need entirely different transportation? :helicopter:

Michelle and Luis—both of you are circling something critical that I’ve been wrestling with at the VP level: we’re still using pre-AI organizational design.

The Organizational Design Mismatch

Your code review bottleneck, Michelle? That’s a span of control problem dressed up as a tooling problem. When AI increases PR volume by 40-60%, your review structure designed for human-paced output becomes the constraint.

We tried three approaches at my EdTech company:

  1. Hire more reviewers → Didn’t scale, diluted code quality standards
  2. Automate reviews with AI → Caught syntax issues, missed architectural problems
  3. Reorganize around AI-augmented workflows → Reduced review burden by 35%

Only #3 worked long-term. We created “AI-aware” team structures:

  • Dedicated review rotation for AI-heavy PRs (different standards)
  • Senior engineers shifted from “review everything” to “review architectural decisions”
  • Junior engineers paired for AI-generated code learning sessions

Luis’s Risk Mitigation Point is Gold

That 1.7× security finding increase? In our case, it exposed vulnerabilities our pre-AI process wasn’t catching at all. We weren’t “losing” productivity—we were finally paying our security debt.

The Real Conversation We’re Not Having

Industry talks about AI productivity gains in percentages. But percentage gains are meaningless without asking: gains in what, measured by whom, optimized for which outcome?

If we’re still using story points, sprint velocity, and lines-of-code metrics from 2020, we’re measuring with the wrong ruler.

What if the 30-50% individual gains are real, but our management metrics are misaligned with AI-era workflows? What would AI-native engineering metrics even look like?

As someone who sits between engineering and business, this thread is fascinating because you’re all describing the same phenomenon from different angles—and it explains why my product roadmaps keep getting… weird.

The Product Planning Paradox

Engineering tells me: “We can build features 30-50% faster with AI tools!”
Reality: Our release velocity increased 12%.

For months I thought engineering was sandbagging estimates. This thread made me realize: the constraint moved, but our planning didn’t.

What Changed (And What Didn’t)

With AI coding:

  • :white_check_mark: Feature implementation is faster
  • :white_check_mark: Bug fixes ship quicker
  • :cross_mark: Architectural decisions take the same time
  • :cross_mark: Cross-team coordination is unchanged
  • :cross_mark: Product validation still requires human judgment

We optimized the “building” phase but left the “deciding” and “validating” phases untouched. The result? Engineers spend less time coding and more time in meetings, refinement sessions, and architecture discussions.

Keisha’s Question About Metrics Hits Different

From a product perspective, I don’t actually care if engineers write code 50% faster. I care if we:

  1. Ship the right features faster
  2. Reduce time-to-learning (validate assumptions quicker)
  3. Maintain technical flexibility (avoid AI-generated tech debt)

Maya’s design systems analogy is perfect: we’re measuring “time to mock” when we should measure “time to validated user outcome.”

The Uncomfortable Product Truth

If AI lets engineers code faster but doesn’t help us decide what to build, then the productivity gain becomes… more output of potentially wrong things? :grimacing:

Michelle, your question about redesigning processes should probably start with: “What’s the actual constraint in shipping value to customers?” In most orgs I’ve worked with, it’s not coding speed—it’s decision quality and coordination overhead.

Are we solving the wrong problem entirely?