93% of us use AI coding assistants. Productivity is still only 10% better. What are we missing?

I need to share something that’s been bothering me for months. My team has embraced AI coding assistants—GitHub Copilot, Cursor, Claude Code, you name it. Our adoption rate mirrors the industry: 93% of developers now use these tools. Yet when I look at our sprint velocity, deployment frequency, and actual feature delivery, we’re seeing maybe a 10% improvement. Maybe.

I thought I was missing something, until I found the research.

The Perception Gap

A METR study ran a controlled experiment with experienced developers. The result? Developers using AI were 19% slower on average. But here’s the kicker: they believed they were 20% faster. Before starting, they predicted AI would make them 24% faster. After finishing—even with objectively slower results—they still thought AI had sped them up by about 20%.

This isn’t just a measurement problem. It’s a perception problem.

The Data Paints a Messy Picture

Let’s be honest about what the research shows:

  • 93% adoption, 10% productivity gain - That’s a massive disconnect
  • AI-assisted code has 1.7× more issues and 9% more bugs per developer
  • PR sizes increased 154% on average with AI tools
  • Bain & Company described real-world savings as “unremarkable”
  • Meanwhile, GitHub, Google, and Microsoft’s early studies claimed 20-55% faster task completion

Someone’s measuring the wrong thing. Or maybe we all are.

What Are We Actually Measuring?

Here’s my theory: We’re measuring coding speed, not problem-solving speed. We’re counting commits and PRs, not customer value delivered. We’re tracking lines of code written, not bugs prevented or technical debt avoided.

AI tools are incredible at autocompleting boilerplate, generating tests, and converting comments into code. They make typing faster. But typing was never the bottleneck.

In our team, developers spend maybe 32% of their time actually writing code. The rest is meetings, code reviews, debugging, understanding context, aligning with product, waiting for CI/CD, and dealing with the friction of organizational complexity.

If AI makes that 32% twice as fast, we’ve gained… 16% overall. And that assumes zero quality tradeoff, which the data suggests isn’t true.

So What Should We Measure?

I’m genuinely curious what others think. If lines of code, commits, and PR velocity aren’t the right metrics, what is?

Some candidates:

  • Time to value - How long from idea to production?
  • Deployment frequency - Are we shipping more often?
  • Change failure rate - Are we shipping more bugs?
  • Mean time to recovery - Can we fix issues faster?
  • Cognitive load - Are developers less stressed and more focused?
  • Defect density - Quality per feature, not just speed

Or maybe the answer is simpler: developer satisfaction. If people feel more productive and enjoy their work more, does it matter if the spreadsheet doesn’t show a 50% velocity gain?

The Uncomfortable Question

Is the AI coding assistant productivity promise oversold? Or are we just measuring it wrong?

I’m not anti-AI. I use these tools every day. But if 93% of us have adopted something and organizational productivity has barely moved, we need to either:

  1. Figure out what we’re missing in how we measure productivity
  2. Admit that AI coding assistants solve the wrong problem
  3. Accept that 10% is good enough and price expectations accordingly

What are you seeing in your teams? Are you measuring productivity differently? Have you found metrics that actually correlate with AI tool usage?

I’d love to hear what’s working—or not working—for others.

This hits close to home. I’ve been using Cursor and Claude Code for about 8 months now, and I’ve noticed exactly what you’re describing: I feel wildly productive when I’m generating code, but when I look at what actually ships, the velocity hasn’t changed much.

Here’s my theory: We’re optimizing the wrong part of the pipeline.

AI Makes Writing Faster, Not Reviewing Faster

AI tools are incredible at the initial code generation phase. But that’s maybe 20-30% of the actual work. The rest is:

  • Code review - PRs are 154% larger (per the research you cited), which means review time has scaled proportionally. My team’s average PR review time went from 4 hours to about 8 hours.
  • Debugging AI-generated code - The 1.7× more issues stat is real. AI generates syntactically correct code that’s semantically wrong more often than human-written code. The bugs are subtler and harder to catch.
  • Refactoring for maintainability - AI optimizes for “code that works” not “code that’s understandable six months from now.” We’re accumulating design debt faster.

The Productivity is Front-Loaded

I can scaffold an entire feature in an hour with AI. That feels amazing. But then I spend two days:

  • Fixing edge cases the AI didn’t consider
  • Refactoring auto-generated boilerplate into something our team can actually maintain
  • Writing tests that actually test behavior, not just coverage percentage
  • Explaining to reviewers why the code does what it does (because AI-generated code often lacks clear intent)

The typing was fast. The thinking is still slow. And thinking is the actual work.

Are We Measuring Typing Speed or Problem-Solving Speed?

You nailed it with this framing. Commits, PRs, and lines of code measure typing throughput. But software engineering is problem-solving, not typing.

If we measured:

  • Mean time to fix a production bug - Has AI made us faster here? In my experience, no.
  • Time from bug report to root cause identification - Definitely not faster with AI.
  • Percentage of PRs that require follow-up fixes - This has gone up for us.

The productivity gains are real for the generation phase. But they create downstream costs that eat most of the gains.

My Uncomfortable Take

AI coding assistants are optimizing for the wrong metric (code generation speed) because that’s the easiest thing to measure and demo. The hard parts—understanding requirements, designing systems, making tradeoffs, debugging complex interactions—haven’t gotten faster.

We’re measuring what’s easy to measure (commits, PRs, typing speed) instead of what actually matters (time to value, defect rates, maintainability).

Maybe 10% is the real number, and we should just be honest about it.

Luis, you’re asking the right question, but I think the answer is even more fundamental: We’re measuring developer productivity when we should be measuring business outcomes.

Companies Don’t Ship Lines of Code

This is the disconnect I see at the exec level. Engineering teams report faster development cycles, higher commit volumes, more PRs merged. Meanwhile, the product team is asking, “Why aren’t we shipping features faster?” and the business team is asking, “Why isn’t revenue growing faster?”

Because we’re optimizing the wrong layer of the stack.

AI coding assistants make individuals feel more productive. But organizations don’t deliver individual work—they deliver integrated systems. And integration is where the friction lives:

  • Cross-team dependencies
  • Product alignment
  • Testing and validation
  • Deployment coordination
  • Incident response
  • Customer feedback loops

None of these got faster because developers can autocomplete boilerplate.

The Bain Reality Check

When Bain & Company described real-world AI productivity savings as “unremarkable,” they weren’t measuring commits or PRs. They were measuring business impact: time to market, revenue per engineer, feature delivery rate, customer satisfaction.

By those metrics, AI coding assistants moved the needle barely at all.

Why? Because typing code was never the bottleneck. The bottlenecks are:

  1. Unclear requirements - AI can’t fix “we don’t know what to build”
  2. Organizational complexity - AI can’t navigate cross-team dependencies
  3. Technical debt - AI actually makes this worse (your 1.7× more issues stat)
  4. Testing and validation - AI generates code faster than QA can validate it
  5. Deployment risk - Larger PRs mean higher blast radius

What Should We Measure Instead?

If I could wave a magic wand and replace every engineering productivity metric with business-aligned metrics, here’s what I’d choose:

  • Lead time for changes - Idea to production, not commit to merge
  • Deployment frequency - Are we shipping value more often?
  • MTTR (Mean Time to Recovery) - When things break, how fast do we fix them?
  • Change failure rate - What percentage of deployments cause incidents?
  • Feature adoption rate - Are customers actually using what we build?
  • Revenue per engineer - The ultimate business metric

Notice what’s missing? Commits, PRs, lines of code, typing speed—all the things AI optimizes for.

The Organizational Bottleneck

Here’s what I’ve observed: AI makes individuals faster, but it creates new organizational bottlenecks.

  • Developers generate more code → reviewers become the bottleneck
  • More PRs get merged → testing becomes the bottleneck
  • More features ship → product prioritization becomes the bottleneck
  • Faster development cycles → deployment safety becomes the bottleneck

You can’t AI your way out of organizational friction. You have to fix the system.

My Take: 10% Is the Real Number

I think the 10% productivity gain is accurate—and it’s measuring the right thing. AI helps individuals be slightly more efficient within a system that’s fundamentally constrained by non-coding factors.

If we want 50% productivity gains, we need to fix:

  • How we define requirements
  • How we coordinate across teams
  • How we test and deploy
  • How we measure success

AI coding assistants aren’t the solution to those problems. They’re a marginal optimization on top of a system that needs structural change.

The good news? 10% is actually pretty good for a tool that costs $20-40/month per developer. We just need to stop pretending it’s a revolutionary 10× improvement and treat it like what it is: a helpful incremental gain.