I’ve been thinking about the AI coding assistant conversation, and I keep coming back to two data points that seem impossible to reconcile.
The Quality Problem
Recent research tracking 8.1 million pull requests found that AI-generated code contains 1.7 times more issues than human-written code (10.83 issues per PR vs 6.45). Logic errors appear 1.75x more often. Security vulnerabilities rise 1.57x. And across teams using AI assistants heavily, there’s a 9% increase in bugs per developer.
That’s… not great. These aren’t edge cases—this is systematic quality degradation at scale.
The Perception Gap
But here’s where it gets stranger. Before adopting AI tools, developers expected to be 24% faster. After using them for months, they still believed they were 20% faster.
Measured reality? Tasks took 19% longer to complete.
That’s a 43-percentage-point expectations gap—one of the largest perception-reality disconnects in software engineering research. We feel faster while the clock says we’re slower.
The Bain Reality Check
Then Bain drops their Technology Report 2025 with this line: “Software coding was one of the first areas to deploy generative AI, but the savings have been unremarkable.”
Two out of three software firms rolled out AI coding tools. Among those, adoption is high. Teams report 10-15% productivity boosts. But the actual business impact? Underwhelming.
Why? Developers spend only 20-40% of their time actually writing code. Even significant speedups in code generation translate to modest overall gains when most of our day is meetings, alignment, debugging, and coordination.
The Bottleneck We Created
Here’s the kicker: developers complete 21% more tasks and merge 98% more pull requests with AI assistance. Sounds great, right?
Except AI-generated PRs wait 4.6 times longer before code review begins. Overall review time increases 91%. Average PR size jumps 154%.
We’ve created a review bottleneck. Human approval can’t scale with AI velocity. And when reviewers get fatigued from 5-10x volume, they start skimming—precisely when AI code has 1.7x more issues that need catching.
So Who’s Right?
Are we measuring the wrong things? Quality vs velocity—is this a necessary tradeoff, or are we simply deploying AI wrong?
The research shows individual developers produce more code. But organizations see “unremarkable” savings. Quality degrades. Review becomes the limiting factor. Time saved coding doesn’t redirect to higher-value work.
Maybe the answer isn’t “who’s right” but “what are we optimizing for?”
I’m curious: For those using AI assistants on your teams—what patterns are you seeing? Is the quality-velocity tradeoff real, or is this a deployment and process problem?