93% of Developers Use AI Daily, But Productivity Is Still 10%—Are We Measuring the Wrong Thing or Building the Wrong Thing?

I just read this CTO’s reflection and it stopped me in my tracks: 93% of developers are using AI coding assistants, but productivity has only improved by 10% since 2023.

Let that sink in. We’ve gone from “AI is a toy” to “everyone’s using it daily” in under 2 years. But the actual, measurable improvement? Barely double digits.

The Numbers Don’t Add Up

Here’s what the data shows:

  • 92.6% of developers now use AI coding assistants (source)
  • 51% use them daily, 75% use them weekly
  • AI-authored code is now 26.9% of all production code, up from 22% just last quarter
  • Code commits are up 13.5%, compilation frequency up 38.4%

And yet… organizations report flat delivery velocity. Features aren’t shipping faster. Roadmaps haven’t accelerated. The business isn’t seeing the 2x-3x gains that the AI tooling vendors promised.

So What’s Actually Happening?

I see three possible explanations:

1. We’re Measuring the Wrong Things

Maybe “productivity” isn’t lines of code or commit velocity. Maybe it’s:

  • Time to working feature (requirements → production)
  • Cognitive load reduction (how exhausted are developers at the end of the day?)
  • Bug reduction (are we shipping higher quality code?)
  • Learning velocity (how fast can new team members contribute?)

Traditional DORA metrics were designed for human-written code. Maybe they don’t capture what AI-assisted development actually optimizes for.

2. We’re Optimizing the Wrong Bottleneck

Code generation might be 20% of the software delivery lifecycle. The rest is:

  • Understanding requirements
  • Designing architecture
  • Writing tests
  • Code review
  • Debugging
  • Deployment
  • Monitoring

If AI makes “writing the initial implementation” 3x faster, but code review now takes 91% longer because of quality concerns (yes, really), we haven’t actually sped up the overall system.

3. We’re Building the Wrong AI Tools

What if current AI coding assistants are optimized for the wrong use case?

They’re great at:

  • Generating boilerplate
  • Autocompleting obvious patterns
  • Translating between languages

They’re not great at:

  • Understanding business context
  • Making architectural decisions
  • Navigating legacy codebases
  • Debugging subtle integration issues

Maybe we’ve built AI tools that help with “typing code” when the real bottleneck is “understanding what code to write.”

The Quality Tax

Here’s the uncomfortable truth buried in the research: AI-assisted code has 1.7x more issues and 23.7% more security vulnerabilities than human-written code.

So we’re writing code faster, but:

  • Spending more time in code review
  • Finding bugs later in the cycle
  • Creating technical debt we’ll pay for later

Is that “productivity”? Or are we just shifting time from “writing” to “fixing”?

The Question I Can’t Answer

What bothers me most is the 93% adoption rate.

Developers aren’t stupid. If these tools didn’t provide some value, they wouldn’t use them daily. But the organizational metrics aren’t moving.

So either:

  • A) Developers feel more productive (cognitive load, creativity, job satisfaction) but we’re not measuring it
  • B) Developers are using AI to keep up with increased expectations, not to get faster (running faster on the treadmill)
  • C) The productivity gains are real but being absorbed by increased scope (we’re building more ambitious features with the same timeline)
  • D) It’s still too early and we’re in the “computer productivity paradox” phase where the gains won’t show up in aggregate data for years

What Do You Think?

For those of you using AI coding assistants daily:

  • Do you feel more productive?
  • Are you shipping more features?
  • Are your roadmaps moving faster?

And for those of you measuring engineering productivity:

  • Have you changed your metrics for the AI era?
  • Are you seeing the disconnect between “devs say they’re faster” and “velocity hasn’t changed”?

I’m genuinely curious: Are we measuring the wrong thing, building the wrong thing, or just too early to tell?

This resonates hard. I’m seeing this exact pattern with my team.

We introduced GitHub Copilot 8 months ago. Adoption was instant—within 2 weeks, 100% of the team was using it daily. Developers love it. When I ask them in 1-on-1s, they say things like “I can’t imagine coding without it now” and “it makes the boring parts so much faster.”

But when I look at our velocity metrics? Nothing has changed.

  • Sprint velocity: flat
  • Lead time for changes: actually slightly up
  • Deployment frequency: unchanged
  • MTTR: unchanged

The Hypothesis: We’re Measuring Outputs, Not Outcomes

I think you nailed it with “we’re measuring the wrong things.”

Our metrics track:

  • Lines of code committed
  • PRs merged
  • Story points completed

But none of those measure “did we ship the feature faster?”

What I think is happening: AI helps developers write implementation code faster, but we haven’t touched:

  • Requirements clarification (still takes 2-3 days)
  • Design reviews (still weekly cadence)
  • QA cycles (still manual, still slow)
  • Deployment approvals (still gated)

So maybe developers are writing code 2x faster, but if that’s only 20% of the cycle time, we get a 10% overall improvement. The math checks out, actually.

What I’m Trying Next

I’m proposing we measure “time to working feature” instead of velocity:

  • Time from “requirements approved” to “in production”
  • Broken down by phase: design, implementation, review, QA, deploy

My bet is we’ll find implementation time dropped 40-50%, but everything else is unchanged. Which means the bottleneck isn’t code generation anymore—it’s everything around code generation.

Has anyone actually measured where time goes in their delivery pipeline? I’d love to see real data on this.