93% of Developers Use AI But Productivity Is Stuck at 10%—Are We Measuring the Wrong Thing or Building the Wrong Tools?

I need to share something that’s been keeping me up at night. My team at our Fortune 500 financial services company has embraced AI coding assistants across the board—93% of our 40+ engineers use them daily. We’re generating more code than ever. Pull requests are up. Individual developers report saving 3-4 hours per week. By every measure we track, AI adoption is a success.

Except for one thing: our actual productivity is basically flat.

Sprint velocity hasn’t budged. Feature completion rates are the same as a year ago. Time-to-production for new capabilities? Unchanged. And when I talk to other engineering leaders, I hear the same story. We’ve hit what one CTO I know calls “the 10% plateau”—productivity went up about 10% when AI tools first took off in 2023, and since then it’s just… stayed there.

Here’s what makes this so confusing: the data says we should be seeing massive gains. According to recent research, 92.6% of developers now use AI coding assistants. AI-authored code makes up 26.9% of production code—up from 22% just last quarter. Developers using Copilot complete 26% more tasks on average. Code commits are up 13.5%, compilation frequency up 38.4%.

So where’s the productivity?

The Numbers Don’t Add Up

Let me break down what we’re seeing in my org:

  • Individual velocity: Up 20-30% for boilerplate and straightforward features
  • Code review time: Up 40-50% because reviewers spend longer validating AI-generated code
  • Bug fix time: Slightly longer—AI code sometimes introduces subtle issues that take time to trace
  • Sprint commitments: Same as before AI adoption
  • Features shipped per quarter: Basically unchanged

It’s like we got faster at writing code, but somehow that didn’t translate to shipping more value.

Three Theories I’m Wrestling With

Theory 1: We’re measuring the wrong things

Maybe lines of code, PR velocity, and task completion are vanity metrics. If AI helps us write 30% more code but only 10% of that code delivers business value, did we actually get more productive? Or did we just get better at creating… more code?

Theory 2: AI tools optimize for individual speed, not team outcomes

Copilot makes me faster. But what about the downstream effects? The extra review burden. The context-switching cost when AI suggestions take us down the wrong path. The time spent debugging AI-generated code that “works” but doesn’t match our architecture patterns.

Theory 3: We’re hitting new bottlenecks

Maybe the constraint was never “how fast can we write code”—it was always product clarity, architecture decisions, deployment processes, or cross-team coordination. AI just exposed that by removing the coding bottleneck.

What I Really Want to Know

How are you measuring AI’s impact beyond individual developer velocity? Are you seeing the same plateau? And more importantly—what actually matters?

Because right now, I’m spending a lot of time in budget meetings defending our AI tool spend ($150-200/developer/month) while looking at productivity dashboards that show… not much has changed.

I want AI to work. I believe in it. But I also need to be honest about whether we’re chasing the right metrics or building with the right tools.

What am I missing here? What should I be measuring that I’m not?

Luis, I think you’re asking the right questions, but I want to challenge the premise: maybe productivity is the wrong metric entirely.

We deployed AI coding assistants across our 120-person engineering org 9 months ago. And you’re right—by traditional measures, productivity barely moved. Sprint velocity up maybe 8%. Cycle time basically flat.

But here’s what DID change: we’re executing a roadmap that would’ve required 30-40% more headcount a year ago.

We’re not shipping faster. We’re shipping bigger. More ambitious features. More complex integrations. Problems we would’ve said “we don’t have capacity for that” to in 2024, we’re now building in 2026 with the same team size.

The Capability Expansion Lens

I think the 10% plateau you’re seeing is because we’re measuring “faster horses” when AI gave us the ability to build airplanes.

When email first came out, did it make people write letters 10% faster? Or did it fundamentally change what kinds of communication were possible? We’re in that transition moment with AI and code.

Here’s my contrarian take: The fact that productivity only went up 10% might actually be GOOD news. It means we’re using AI to tackle harder problems, not just crank out more of the same work.

Three months after AI deployment, my team leads started proposing features they would’ve previously scoped out as “Q3 2027 at earliest.” That’s the real signal.

What I’m Measuring Instead

We track:

  • Scope ambition: Are teams proposing more complex solutions than they did pre-AI?
  • Headcount avoidance: How many engineers did we NOT hire because AI expanded existing team capacity?
  • Technical debt reduction velocity: Are teams tackling refactors they put off for years?

On those metrics? We’re up 40-60%.

The Uncomfortable Question

If AI lets your team do the work of a team 1.4× its size, did productivity go up 40%… or did it stay flat because you’re now doing 40% more ambitious work?

I realize I’m defending a similar budget line item as you ($180/developer/month), but I’m not defending it on productivity grounds. I’m defending it on capability expansion grounds. We avoided hiring 3 engineers this year. That’s $450K in fully-loaded costs.

The question I keep asking my board: Would you rather have a team that ships 10% faster, or a team that can build things they couldn’t build at all last year?

This conversation is hitting close to home for me, because I’m living the AI productivity paradox from the IC trenches.

I use Copilot and Cursor every single day. They’re genuinely helpful for boilerplate, repetitive patterns, and “write me 20 test cases” scenarios. But I want to be honest about something nobody’s saying out loud:

Debugging AI-generated code often takes longer than if I’d just written it from scratch.

The “Almost Right” Tax

Here’s what my day looks like now:

  1. AI suggests a solution that’s 85% correct
  2. I spend 10 minutes tweaking it to actually fit our architecture
  3. Code review catches a subtle bug because the AI pattern doesn’t match our error handling conventions
  4. I spend another 20 minutes refactoring to align with team standards

Versus the old way:

  1. I write it myself in 30 minutes
  2. It matches our patterns from the start
  3. Review takes 5 minutes

The AI version feels faster because I get to “working code” in 15 minutes. But speed to first draft ≠ speed to done.

Design Systems Are My Nightmare

I lead our design system, and AI-generated components are becoming a real problem:

  • Developers use AI to generate React components that technically work
  • But they don’t use our design tokens
  • They reimagine patterns we already solved
  • They introduce inconsistencies that compound over time

I spent 2 hours last week fixing a form component someone generated with AI because it didn’t use our established validation patterns. The initial PR looked great—comprehensive, well-tested. But it was technical debt disguised as productivity.

Where AI Actually Helps Me

Don’t get me wrong—AI is genuinely valuable for:

  • Boilerplate I used to copy-paste
  • Test case generation
  • “How do I do X in this framework?” quick answers
  • Pair programming on unfamiliar tech stacks

But the productivity gain is way smaller than the hype suggests.

My Theory on the Plateau

I think we’re optimizing for the wrong stage of the process. AI makes drafting code faster. But software development is:

  • 20% writing initial code
  • 30% debugging and refinement
  • 30% code review and iteration
  • 20% dealing with deployment/integration issues

If AI only speeds up that first 20%, the theoretical max productivity gain is… 20%. And that’s if it doesn’t create extra work in the other 80% (which, in my experience, it often does).

@eng_director_luis Your theory about “new bottlenecks” resonates. We got faster at the thing that was never actually the constraint.