Last week I sat in our sprint retro and someone asked: “We’re all using Claude, Cursor, Copilot… why does it still feel like we’re moving at the same pace?”
The silence was loud.
Here’s what I’ve been thinking about: 41% of all code written in 2025 is AI-generated. That’s not a prediction—it’s already happened. 82% of developers use AI tools weekly. Some of us are running 3+ tools in parallel. And yeah, we’re definitely coding faster. Studies show 30-55% speed improvements for scoped tasks.
But our delivery velocity? Basically unchanged.
The bottleneck just moved
I started tracking this on my team. Developers finish features 40% faster than last year. Pull requests get opened way more frequently—someone mentioned a 98% increase in PR volume across high-adoption teams.
But guess what else happened? PR review time increased 91%.
The bottleneck didn’t disappear. It just migrated downstream.
Now we’re all waiting on reviewers. And reviewers are drowning because AI-generated code tends to be… verbose. More lines to review. More edge cases to think through. And here’s the kicker: only 33% of developers say they actually trust AI-generated code. So reviewers are reading everything twice.
From a design perspective, I’m seeing the same pattern in UX review. Engineers prototype UIs faster with AI assistance, but the designs are often inconsistent with our system, use deprecated patterns, or ignore accessibility. So design review has become the new bottleneck.
Are we optimizing for the wrong metrics?
Individual velocity is up. That’s real. But organizational throughput is flat—or worse, because we added coordination overhead.
It’s like giving everyone on an assembly line faster tools, but not widening the conveyor belt. You just create a pile-up.
I keep thinking about what happens when 50%+ of code is AI-generated by late 2026 (current trajectory). If we don’t fix the downstream bottlenecks—review, QA, security validation, integration testing—we’re just making the pile-up bigger.
What if we’ve been measuring the wrong thing all along?
Instead of “how fast can one person write code,” maybe the question is “how fast can value flow through the entire system?”
What should we actually measure?
I’m genuinely curious what y’all are seeing:
- Are your teams shipping faster with AI tools, or just coding faster?
- Where are your bottlenecks showing up now?
- What metrics are you tracking beyond individual velocity?
- Has anyone successfully restructured their review/QA processes to match the new pace?
Because right now it feels like we’re all optimizing our local maxima while the global system stays stuck.
What am I missing here?