We’ve had GitHub Copilot and other AI coding assistants rolled out across our engineering teams for eight months now. The individual feedback has been overwhelmingly positive—developers report feeling more productive, shipping code faster, and spending less time on boilerplate. Our internal surveys show 78% of engineers say AI tools make them “significantly more productive.”
But here’s what’s puzzling me: our sprint velocity hasn’t changed. Features still take the same amount of time to ship. Our DORA metrics are flat. Time-to-production for new capabilities looks identical to a year ago.
The Data Says There’s a Disconnect
I started digging into the research, and we’re not alone. Recent studies show:
- Developers using AI complete 26% more tasks and merge 60% more pull requests (IT Revolution research)
- AI coding tools save an average of 3.6 hours per week per developer
- 84% of developers report using AI tools, with 51% using them daily
Yet the same research reveals a stunning paradox: companies report NO measurable improvement in delivery velocity or business outcomes despite these individual productivity gains (Faros AI Productivity Paradox Report).
Are We Measuring Activity Instead of Value?
I think we’re tracking the wrong things. We celebrate:
- More commits pushed
- More PRs opened
- Faster code completion times
- Lines of code written per hour
But none of these are business outcomes. They’re activity metrics, not value metrics.
A developer can use AI to refactor a module 3× faster than before—that shows up as productivity. But if that refactoring doesn’t enable new features or improve customer experience, did we actually become more productive as an organization?
Where Do the Gains Disappear?
My hypothesis: AI speeds up 5-10% of our delivery pipeline—the coding part—while the other 90-95% remains unchanged.
The bottlenecks are still:
- Code review (now arguably harder because reviewers need to verify AI-generated patterns)
- Testing and QA (same capacity, but more code to validate)
- Integration and deployment (unchanged process)
- Product refinement and requirement clarity (no AI help here)
We optimized one step in a multi-step pipeline and expected the entire pipeline to speed up. That’s not how systems work.
The Real Question: What Should We Measure?
If individual coding speed isn’t translating to business results, what should we be tracking?
I’m wrestling with this question for our next board meeting. Some possibilities:
Outcome-focused metrics:
- Features delivered per sprint (not story points, actual customer-facing capabilities)
- Time from customer request to production deployment
- Business KPI impact (activation, retention, revenue) per engineering sprint
- Customer problems solved, not tickets closed
System health metrics:
- End-to-end cycle time (idea → production)
- Change failure rate and rollback frequency
- Technical debt accumulation vs reduction
- Code quality trends (not just code volume)
But I’m not confident these capture it either.
Your Perspectives?
How are you measuring AI productivity impact at your organizations?
Are you seeing the same disconnect between individual developer velocity and team delivery outcomes?
What metrics have you found that actually correlate with business value instead of just engineering activity?
I’d love to hear what’s working—or what’s failing—for others navigating this. Because right now, I’m celebrating productivity gains I can’t actually demonstrate to the business.