Last week, I sat in a leadership review where my director of platform engineering shared a chart that stopped the conversation cold. Our team’s PR velocity was up 35% year-over-year. Our deploy frequency? Down 18%.
We’re in the middle of what everyone’s calling the AI productivity revolution. Our developers are using Copilot, Cursor, and half a dozen other AI coding assistants. The code is flowing. But somehow, we’re shipping slower.
The Numbers Don’t Add Up
The industry data mirrors what we’re seeing:
- 41% of all code written today is AI-generated (source)
- 76% of developers use or plan to use AI coding tools
- Yet 76% don’t use AI for deployment and 69% skip it for planning (source)
In our organization, individual developers report completing 21% more tasks. But our team throughput? Basically flat. Our main branch success rate dropped from 87% to 71% over the last six months as AI adoption increased.
Where the Wheels Come Off
The bottleneck isn’t in writing code anymore. It’s everywhere else:
Code review has become our new constraint. Our senior engineers are spending 40% more time in review than they were a year ago. PR review time is up 91% across teams with high AI adoption (source). The AI writes fast, but the code needs more scrutiny. Subtle bugs. Edge cases the AI didn’t consider. Code that works but doesn’t follow our architectural patterns.
Deployment risk has increased. Teams that use AI tools very frequently see a 22% rollback rate - meaning one in five deployments needs to be rolled back, hotfixed, or causes a customer incident (source). That’s making us more conservative about releases, not less.
Quality issues are surfacing later. Projects with heavy AI-generated code showed a 41% increase in production bugs (source). We’re catching some in review, but the ones that slip through are expensive. Last month, an AI-generated API integration missed a critical error handling path. It passed our tests. It broke in production under load. Three hours to diagnose because the code pattern was unfamiliar to the engineer who wrote it - because they didn’t really write it.
The Leadership Dilemma
Here’s what keeps me up at night: We’re measuring the wrong things.
We celebrate PRs merged. Lines of code committed. Tasks moved to “Done.” But our customers don’t see any of that. They see features shipped. Bugs fixed. Value delivered.
The real question isn’t “How much code can AI help us write?” It’s “How much value can we deliver to production with AI in the mix?”
And right now, I don’t have good answers.
What We’re Trying
I’m experimenting with a few things on my teams:
- Separate metrics for AI-assisted code - We tag PRs that used AI heavily and track their review time, bug rates, and production success separately
- Required design artifacts before AI implementation - For complex features, engineers must write a technical design doc before letting AI generate code
- AI code review training - Teaching our senior engineers patterns to look for in AI-generated code
- Measuring “commit to customer” time instead of “start to commit” time - Tracking the full cycle
Early days, but we’re seeing some improvement in PR quality at least.
The Uncomfortable Question
Are we optimizing for the wrong metrics?
When my CEO asks “Why haven’t we accelerated our roadmap with all this AI investment?”, what’s the honest answer?
I’m curious - Are others seeing this same disconnect between individual productivity and team throughput? What are you measuring to understand if AI is actually making your organization faster, not just your developers busier?
This isn’t about being anti-AI. I believe in the tools. But I think we’re in the messy middle of a transition - our workflows, our processes, our metrics haven’t caught up to what AI makes possible. And until they do, we’re going to keep seeing this paradox: more code, fewer releases.
What are you seeing in your organizations?