AI coding assistants now write 41% of all code in 2026, with 84% of developers using these tools daily. That’s not a pilot program anymore—it’s production infrastructure for how we build software.
But here’s where it gets interesting: the productivity numbers tell wildly different stories depending on who’s measuring.
The Vendor Story
GitHub, Google, and Microsoft—all vendors of AI coding tools—published studies showing developers completing tasks 20% to 55% faster. GitHub Copilot users report feeling more productive. Google’s internal studies show significant velocity gains. Microsoft’s research highlights developer satisfaction improvements.
Those are compelling numbers. If your engineers could ship 50% faster, that would fundamentally change your roadmap capacity, right?
The Independent Reality Check
Bain & Company’s analysis paints a different picture. Their research shows teams using AI assistants see 10% to 15% productivity boosts—and critically, that time saved often isn’t redirected toward higher-value work. So even those modest gains don’t translate into positive returns.
Why the gap? Jue Wang, a partner at Bain, points out that developers spend only 20% to 40% of their time coding. Even a significant speedup there translates to more modest overall gains.
More striking: writing and testing code accounts for about 25% to 35% of the time from initial idea to product launch. Speeding up these steps does little to reduce time to market if other stages remain bottlenecked.
The Paradox Nobody’s Talking About
Here’s what really keeps me up at night: analysis of over 10,000 developers across 1,255 teams shows developers using AI complete 21% more tasks and merge 98% more pull requests.
Yet company-wide delivery metrics for throughput and quality show no improvement.
Individual developers are working faster. But companies aren’t shipping better software any faster. Where are those productivity gains going?
The Measurement Methodology Problem
I think we’re measuring the wrong things. Vendor studies measure task completion velocity in controlled environments. But production software development isn’t a series of isolated coding tasks.
Independent studies like METR’s randomized controlled trial found developers take 19% longer when using AI tools once you include review and debugging time. That’s the opposite of vendor claims.
The overhead of prompting, waiting, reviewing AI suggestions, and debugging generated code can exceed the coding speedup. Cursor data shows only 39% of AI generations were accepted—that’s a lot of friction.
The Questions I’m Wrestling With
As someone responsible for product velocity and delivery, I need better frameworks for evaluating AI productivity claims:
-
What should we actually measure? Task completion speed? Time to production? Change failure rate? Developer satisfaction?
-
Are we optimizing for individual velocity or organizational throughput? Because those appear to be different outcomes.
-
How do we account for the systemic effects? If AI speeds up coding but creates downstream bottlenecks in code review and testing, did we actually gain anything?
-
Is the value in speed or quality? Maybe AI’s real benefit isn’t velocity but enabling better architecture or more thorough testing.
I’d love to hear from this community:
- How are you measuring AI coding assistant impact at your organizations?
- What metrics matter for evaluating these tools?
- Have you seen the vendor-claimed productivity gains translate to actual business outcomes?
Because right now, it feels like we’re either measuring the wrong things, or selling tools that optimize for the wrong part of the development process.
Sources for the curious: