Last week our CFO walked into my office and said: “Michelle, I need you to cut your AI tooling budget by 25% for 2027. Show me the ROI or we’re scaling back further.”
I wish I could say I had a compelling response ready. I didn’t.
The Paradox We’re All Living
Here’s what makes this so frustrating: 91% of engineering organizations have adopted AI coding tools, but productivity gains are stuck at 10%. We’re spending an average of $800 per developer per year on AI assistants, code completion, and automated testing—and when leadership asks “what are we getting for this?”—most of us are scrambling.
The data I’ve been reading paints an uncomfortable picture:
- 56% of CEOs report zero measurable ROI from AI investments in the past 12 months
- Only 29% of executives can measure ROI confidently
- In one METR research study, developers believed they were 20% faster with AI assistance, but objective measurement showed they were actually 19% slower
That last one keeps me up at night. Are we measuring the right things? Or worse—are we fooling ourselves?
What I’m Seeing on My Own Teams
We’ve deployed GitHub Copilot, Claude Code, and automated testing tools across 85 engineers. Developers love them. Our AI code acceptance rate is 41%—right in line with industry averages. PR volume is up 98% year-over-year.
But here’s the problem: our release velocity hasn’t meaningfully changed.
We’re creating more code, faster—but the bottleneck shifted to code review, QA, and integration testing. Our code churn is projected to increase significantly, and delivery stability actually decreased 7.2% last quarter.
So when my CFO asks for ROI, what do I say? “We’re writing more code”? That’s not a business outcome. That’s activity.
The Infrastructure Tax Nobody Talks About
Then there’s the data foundation issue. I learned recently that for every $1 spent on AI tools, you need to invest $20 in data architecture to make them work properly. That’s not just LLM APIs and SaaS subscriptions—it’s observability, metrics platforms, experimentation frameworks, and the engineering time to instrument everything.
Our CFO sees the line item for Copilot licenses. She doesn’t see the hidden costs of making measurement possible.
What Metrics Actually Matter?
I’ve been researching frameworks—DORA metrics, SPACE, DevEx Core 4, AI-specific KPIs—and honestly, I’m overwhelmed. Engineering leaders report that 86% feel uncertain about which tools provide the most benefit, and 40% don’t have enough data on adoption and impact to build an ROI story.
Should we measure:
- Velocity metrics? (PR throughput, cycle time, deployment frequency)
- Quality metrics? (defect rates, code churn, production incidents)
- Developer experience? (satisfaction surveys, cognitive load, retention)
- Business outcomes? (time-to-market for features, customer satisfaction, revenue impact)
The honest answer is probably “all of the above,” but that feels impossibly complex when you’re staring down a 25% budget cut and need to make a case this quarter.
The Question I’m Wrestling With
Here’s what I keep coming back to: If only 25% of historical AI projects met expected returns (IBM, May 2025), are we the exception or the rule?
Are we investing in the right AI tools? Should we kill the portfolio of experiments and double down on what’s working? Or are we measuring the wrong things entirely—and the value is real but invisible to traditional metrics?
What I Need from This Community
I know many of you are facing similar pressure. Some of you have probably already had these conversations with your CFOs and boards. I’d love to hear:
- What metrics have you used to prove AI ROI? What resonated with finance leadership?
- Have you killed any AI tools? What made you decide they weren’t worth it?
- How do you separate signal from noise? Developers feel more productive—but measurement is ambiguous.
- What’s your budget strategy for 2027? Are you defending current spend, cutting, or doubling down?
The stakes feel high. If we can’t prove value now, we risk losing the opportunity to invest in tools that might actually transform how we build software. But if we’re spending money on theater and vanity metrics, we deserve the budget cuts.
Where’s the truth in all this?