There’s been an explosion of research on AI coding assistants: GitHub Copilot saves developers 3.6 hours per week. Another study shows 55% productivity gains. AWS reports significant time savings.
Everyone’s celebrating.
But here’s what I’m noticing at our fintech startup: our engineering team is shipping the same number of features as last year, despite widespread AI adoption.
The Productivity Paradox
Our engineers are writing code faster. PRs are getting created more quickly. But velocity—measured in customer value delivered—hasn’t budged.
What’s going on?
I think we’re measuring the wrong things.
Time Saved ≠ Value Delivered
“3.6 hours saved per week” measures inputs (time spent coding). But product success depends on outputs (features shipped, customer problems solved, revenue enabled).
What if AI is making us:
- Write code faster, but the wrong code?
- Ship features faster, but features nobody uses?
- Increase individual productivity while decreasing team effectiveness?
The Metrics I Think We Should Track Instead
Instead of “time saved,” what if we measured AI impact on:
1. Customer value delivered
- Features shipped per quarter (not PRs created)
- Customer satisfaction scores
- Time from feature idea to customer impact
2. Quality and reliability
- Bug rates in AI-generated code vs. human-written code
- Incidents caused by rushed AI-assisted features
- Technical debt accumulation rate
3. Innovation time
- How much time do developers spend on creative problem-solving vs. routine coding?
- Are we using AI time savings for experimentation, or just shipping faster?
4. Team dynamics
- Cross-team collaboration quality
- Knowledge sharing and mentorship (does AI reduce this?)
- Junior developer growth rates
Why This Matters
If we optimize for “time saved” without tracking value delivered, we risk:
- Shipping faster to nowhere
- Accumulating technical debt at AI speed
- Training a generation of developers who can’t code without AI
But if we measure AI impact on actual business outcomes, we might discover:
- AI is amazing for boilerplate, terrible for system design
- Time “saved” is being spent on rework and bug fixes
- The real value is freeing senior engineers for architectural work
My Ask for This Community
Engineering leaders: What metrics are you using to evaluate AI coding tools beyond “time saved”?
CTOs: How do you connect individual AI productivity gains to team-level velocity?
Product people: Are you seeing customer value increase proportional to the code velocity gains everyone’s reporting?
I’m not anti-AI. I’m pro-measurement. Let’s make sure we’re measuring what actually matters.