Our CFO pulled me aside last week after the board meeting. “Michelle, we’ve invested 00K in AI coding tools this year. What’s the ROI?” I had plenty of individual developer testimonials—engineers saying they’re 30-40% faster, shipping features in days instead of weeks. But when I looked at our DORA metrics? Nothing. Flat. Some actually worse.
I’m not alone in this. The research is painting a confusing picture:
The Individual vs. Organizational Paradox
Individual level: Studies show 30-60% speed improvements for scoped programming tasks. GitHub’s data shows developers completing 21% more tasks, merging 98% more pull requests[1].
Organizational level: But deployment frequency? Same. Lead time for changes? Same. Change failure rate? Actually increased in some teams[2].
What gives?
Where the Gains Disappear
I think I’ve figured out where our productivity gains are going: They’re getting absorbed by downstream bottlenecks we never optimized.
The data is brutal:
- PR review time up 91%[1:1] - Our senior engineers are drowning in review queues because AI tripled code output but we didn’t scale review capacity
- Testing bottlenecks - AI writes code fast, but our test suites weren’t designed for this volume
- Deployment pipeline unchanged - Same release processes, approval gates, deployment windows
It’s Amdahl’s Law playing out in real time: A system moves only as fast as its slowest link. We accelerated one part (code writing) without modernizing the rest (review, testing, deployment).
The Measurement Crisis
Here’s the hard part: My CFO doesn’t care about individual velocity. She cares about:
- Are we shipping features to customers faster?
- Are we reducing incidents?
- Are we delivering more business value per engineering dollar?
And honestly? I don’t have good answers yet.
The traditional metrics—DORA, velocity, throughput—were designed for a different era. They assume humans write code at human speed. When AI 10x’s code generation but we’re still measuring deployment frequency, we’re missing the real story.
What I’m Trying
We’re experimenting with a three-layer measurement approach[3]:
- Usage metrics: AI tool adoption rates, prompt usage, code acceptance rates
- System metrics: PR throughput, review latency, merge-to-deploy time
- Business metrics: Feature time-to-market, customer-reported defects, support ticket reduction
Early hypothesis: AI is making us busy but not necessarily effective. We’re writing more code, but are we solving more customer problems?
Questions for This Community
For engineering leaders: What metrics are you actually using to prove AI ROI? What’s resonating with your CFO/board?
For product leaders: How are you connecting AI developer productivity to business outcomes?
For anyone measuring this: Are we fundamentally measuring the wrong things? Should we abandon DORA metrics in the AI era?
I need to have real answers by Q2 earnings. Help me think through this.