I need to share something that’s been keeping me up at night. My team at our Fortune 500 financial services company has embraced AI coding assistants across the board—93% of our 40+ engineers use them daily. We’re generating more code than ever. Pull requests are up. Individual developers report saving 3-4 hours per week. By every measure we track, AI adoption is a success.
Except for one thing: our actual productivity is basically flat.
Sprint velocity hasn’t budged. Feature completion rates are the same as a year ago. Time-to-production for new capabilities? Unchanged. And when I talk to other engineering leaders, I hear the same story. We’ve hit what one CTO I know calls “the 10% plateau”—productivity went up about 10% when AI tools first took off in 2023, and since then it’s just… stayed there.
Here’s what makes this so confusing: the data says we should be seeing massive gains. According to recent research, 92.6% of developers now use AI coding assistants. AI-authored code makes up 26.9% of production code—up from 22% just last quarter. Developers using Copilot complete 26% more tasks on average. Code commits are up 13.5%, compilation frequency up 38.4%.
So where’s the productivity?
The Numbers Don’t Add Up
Let me break down what we’re seeing in my org:
- Individual velocity: Up 20-30% for boilerplate and straightforward features
- Code review time: Up 40-50% because reviewers spend longer validating AI-generated code
- Bug fix time: Slightly longer—AI code sometimes introduces subtle issues that take time to trace
- Sprint commitments: Same as before AI adoption
- Features shipped per quarter: Basically unchanged
It’s like we got faster at writing code, but somehow that didn’t translate to shipping more value.
Three Theories I’m Wrestling With
Theory 1: We’re measuring the wrong things
Maybe lines of code, PR velocity, and task completion are vanity metrics. If AI helps us write 30% more code but only 10% of that code delivers business value, did we actually get more productive? Or did we just get better at creating… more code?
Theory 2: AI tools optimize for individual speed, not team outcomes
Copilot makes me faster. But what about the downstream effects? The extra review burden. The context-switching cost when AI suggestions take us down the wrong path. The time spent debugging AI-generated code that “works” but doesn’t match our architecture patterns.
Theory 3: We’re hitting new bottlenecks
Maybe the constraint was never “how fast can we write code”—it was always product clarity, architecture decisions, deployment processes, or cross-team coordination. AI just exposed that by removing the coding bottleneck.
What I Really Want to Know
How are you measuring AI’s impact beyond individual developer velocity? Are you seeing the same plateau? And more importantly—what actually matters?
Because right now, I’m spending a lot of time in budget meetings defending our AI tool spend ($150-200/developer/month) while looking at productivity dashboards that show… not much has changed.
I want AI to work. I believe in it. But I also need to be honest about whether we’re chasing the right metrics or building with the right tools.
What am I missing here? What should I be measuring that I’m not?