I need to share something that’s been bothering me for months now, and I suspect I’m not alone.
My entire engineering team—40+ developers—has adopted AI coding tools. GitHub Copilot, Claude Code, you name it. When I ask individuals about their experience, they’re enthusiastic. “Saves me 3-4 hours a week.” “Makes boilerplate trivial.” “Helps me explore new frameworks faster.”
But here’s the thing that keeps me up at night: our sprint velocity hasn’t budged. Our cycle time metrics are essentially flat. Our deployment frequency is the same as it was 18 months ago, before anyone touched an AI assistant.
The Data Doesn’t Add Up
I started digging into the research, and it gets even stranger:
- 84% of developers say they use or plan to use AI tools
- 51% use them daily
- Individual developers report 25-55% productivity gains
- They claim to save an average of 3.6 hours per week
But when you zoom out to company-level metrics? Productivity gains haven’t budged past 10%. In some rigorous studies, the correlation between AI adoption and actual outcomes disappears entirely at the organizational level.
Even more troubling: A July 2025 study by METR showed that while experienced developers believed AI made them 20% faster, objective tests revealed they were actually 19% slower.
Where Are the Gains Disappearing?
In financial services (my domain), I’ve identified several black holes:
1. The Code Review Bottleneck
AI writes code fast. Humans review code slowly. We’ve essentially moved our constraint from “writing” to “reviewing.” My senior engineers are drowning in review queues, and they’re frustrated because AI-generated code requires more careful scrutiny.
2. The “Almost Right” Tax
66% of developers say the most common frustration is that AI code is “almost right, but not quite.” That “almost” is expensive. We’re spending hidden time debugging, refactoring, and correcting AI suggestions. This time doesn’t show up in “time saved coding” metrics.
3. Quality Degradation
The numbers here are alarming:
- 9% increase in bugs per developer using AI
- 154% increase in average PR size
- 23.7% more security vulnerabilities in AI-assisted code
In a regulated environment like ours, these quality issues trigger additional compliance reviews that completely negate any speed gains.
4. The Measurement Illusion
We’re measuring the wrong thing. We measure “time to write code,” but what actually matters is “time to ship quality, compliant, reviewed code that solves the customer’s problem.” AI might accelerate step one while slowing down steps 2-5.
The Hard Question We’re Not Asking
Are we adopting tools without changing our processes?
I suspect the real issue is organizational, not technical. We’ve given individuals productivity superpowers, but our systems—code review workflows, testing practices, compliance frameworks, deployment pipelines—weren’t designed for AI-accelerated output.
It’s like giving everyone a Formula 1 race car but keeping the same 35 mph speed limit and the same traffic lights. The car’s potential doesn’t matter if the system is the constraint.
What Should We Actually Be Measuring?
Gartner says we should measure creativity and problem-solving ability over velocity in 2026. That makes intuitive sense, but how do you quantify “creative output”?
Some questions I’m wrestling with:
- Should we measure “time to customer value” instead of “time to code”?
- Should we track “problems solved” rather than “features shipped”?
- Should we measure code quality, maintainability, and security alongside velocity?
- Should we measure “strategic thinking time” vs. “execution time”?
The Bottom Line
84% adoption. 16% real impact.
I’m not saying AI tools are useless. I’m saying we’re in the “tools without transformation” phase. We’ve adopted assistants without industrializing the practice. We’ve accelerated individual work without adapting organizational systems.
For those of you seeing real, measurable productivity gains at the team/company level—what changed beyond just rolling out tools? What processes did you redesign? What metrics actually moved?
And for those in the same boat as me—let’s talk honestly about the gap between the hype and the reality we’re seeing in our teams.
Context: Leading 40+ engineers at a Fortune 500 financial services company. We’ve had near-universal AI tool adoption for 18 months now, and I’m still searching for the promised productivity breakthrough.