CFOs Are Killing 25% of AI Investments—Are We Measuring What Actually Matters?

I just got off a board call where our CFO announced we’re cutting 25% of our AI tool budget. The reason? “We can’t measure the ROI.”

Here’s what’s keeping me up at night: We’re measuring the wrong things.

The Numbers Don’t Lie (But They Don’t Tell the Whole Story Either)

A recent CFO survey found that only 14% of finance leaders see clear, measurable impact from AI investments. Meanwhile, 61% of CEOs are under pressure to show returns. The disconnect is real, and it’s costing us.

But here’s the paradox that should concern every engineering leader: Developers say they’re 20-40% more productive with AI coding assistants. Yet companies with high AI adoption aren’t shipping faster or more reliably.

In fact, one study I reviewed last week found that developers using AI tools actually took 19% longer to complete tasks than without—even though they estimated they were 20% faster. We’re not just bad at measuring AI impact. We’re systematically wrong about what’s happening.

We’re Optimizing for the Wrong Metrics

The problem isn’t AI. It’s that we’re measuring individual output instead of organizational outcomes.

CFOs want to see:

  • Cost per feature
  • Lines of code per engineer
  • Time to close tickets

But these metrics miss what actually matters:

  • Time to onboard new engineers (we’ve cut this nearly in half with AI pair programming)
  • Quality of architectural decisions (engineers explore more options, make better tradeoffs)
  • System reliability improvements (AI helps catch edge cases we used to miss)
  • Competitive positioning (we can tackle problems that would’ve required 2x the headcount)

The hardest conversation I’ve had recently was with our CFO about why our “productivity” metrics look flat despite every engineer loving our AI tools. The answer: The bottleneck moved. We’re not blocked on writing code anymore. We’re blocked on code review, integration testing, and cross-functional alignment.

AI didn’t make those problems worse—it just revealed them as the actual constraints.

What’s Actually Working

Organizations seeing real ROI from AI share three things:

  1. Baseline measurements before rollout - You can’t measure improvement if you don’t know where you started
  2. Workflow redesign before tool adoption - MIT, McKinsey, and Wharton research all say the same thing: transformation fails when treated as a technology rollout
  3. Alignment on what success means - 65% of orgs lack agreement between CFO, CTO, and business leaders on how to measure AI success

At my previous company, we treated AI tool adoption like an ERP decision: clear business case, defined success metrics, post-implementation review. It worked. Here, we let teams pick their own tools and hoped for bottom-up ROI. It didn’t.

The Question I’m Wrestling With

How do we measure things that matter to both engineering excellence and financial accountability?

I’m starting to think the answer is: Stop measuring productivity. Start measuring capability.

Can we solve problems we couldn’t solve before? Can smaller teams tackle bigger challenges? Are we making better decisions faster? Are we building more defensible competitive advantages?

These are harder to quantify. But they’re also what boards actually care about when they ask “What are we getting for our AI investment?”

What metrics are you using to measure AI impact? What’s working? What conversations are you having with your CFO?

I’d love to hear how other technical leaders are navigating this. Especially if you’ve found frameworks that resonate with finance teams.

This hits home. I’ve been wrestling with the exact same paradox from the product side.

Individual Velocity ≠ Product Velocity ≠ Customer Value

Michelle, your point about measuring the wrong things resonates deeply. We’re seeing the same pattern in product delivery:

  • Engineers ship features faster with AI assistance
  • But our time-to-market for new capabilities hasn’t improved
  • Customer satisfaction with releases is actually higher though

I think we’re missing a fundamental shift: AI changed what’s valuable to measure.

The old framework was: More features faster = Better product = Happy customers

The new reality seems to be: Better decisions about which features + Higher quality execution = Better product = Happy customers

Outputs vs. Outcomes

Here’s what I’m starting to track instead of feature velocity:

  • Time to validate a hypothesis (AI helps us test ideas faster, kill bad ones sooner)
  • Quality of product decisions (engineers using AI explore more options, surface better tradeoffs)
  • Customer satisfaction with releases (fewer bugs, better-thought-out features)
  • Competitive positioning (we can tackle challenges that would’ve been out of scope)

The CFO conversation is hard because these metrics don’t show up as pure cost reduction. They show up as competitive advantages that are tough to quantify.

The Question That Keeps Me Up

How do we help CFOs understand that AI ROI might show up in competitive positioning rather than just cost reduction?

If AI lets us ship a differentiated feature that our competitors can’t match without doubling their team size, what’s that worth? Not the same as “we reduced headcount by 10%”—but potentially much more valuable strategically.

I’d love to hear how others are framing this conversation with finance teams. Especially if you’ve found ways to quantify strategic value that don’t rely on traditional productivity metrics.

Michelle, this is exactly what I’m seeing on the ground with my teams. Your insight about the bottleneck moving is spot-on.

The Reality Check

Here’s our data from Q1 after rolling out GitHub Copilot to the entire engineering org:

  • Initial code writing time: Down 35% (developers love it)
  • PR cycle time: Up 15% (wait, what?)
  • Overall delivery velocity: Essentially flat

My team leads were confused. The developers are happy. The code is getting written faster. So why aren’t we shipping faster?

The Bottleneck Shifted

What we discovered: AI changed what engineers spend their time on.

Before AI tools:

  • 60% writing code
  • 20% code review
  • 10% testing/debugging
  • 10% architecture discussions

After AI tools:

  • 30% writing code (AI-assisted)
  • 25% code review (more code to review!)
  • 20% testing/debugging (more experimental paths tried)
  • 25% architecture discussions (exploring more options)

The code gets written faster, but now we’re:

  • Reviewing 40% more PRs because engineers try more approaches
  • Spending more time in testing because the scope of “what’s possible” expanded
  • Having deeper architecture discussions because AI surfaces options we wouldn’t have considered

The Cultural Shift

Here’s what I didn’t expect: AI didn’t just make coding faster. It changed what engineers think is possible.

Two quarters ago, if a feature seemed complex, we’d descope it. Now, engineers prototype it with AI, discover it’s feasible, and we’re tackling bigger problems with the same team size.

Should we be measuring learning and quality improvements instead of pure speed?

What I’m Telling My CFO

“We’re not optimizing for the same thing anymore. AI didn’t reduce costs—it expanded capability.”

The metric that’s working for me: Team capacity to take on strategic initiatives. Last year we could handle 2 major projects simultaneously. This year, same team size, we’re handling 4.

That’s hard to capture in traditional productivity metrics. But it’s real business value.

Pro tip for anyone rolling out AI tools: Establish baseline measurements BEFORE you start. Document current cycle times, quality metrics, team capacity. Otherwise you’re trying to prove ROI with gut feelings instead of data.

Building on what Michelle and Luis shared—I think we’re asking the wrong question when we ask “Are we measuring the right things?”

The real question might be: Are we forming the right hypotheses about what AI should enable?

Where I’ve Seen Real ROI

Luis’s data mirrors what I’m seeing, but here’s where we’ve found genuine business impact:

Onboarding new engineers: Time to 10th merged PR went from 6 weeks to 3 weeks. AI pair programming is like having a senior engineer available 24/7 for junior devs. This one metric alone justified our AI tool spend.

The capability expansion insight: Last quarter, I had three teams compete for a strategic initiative that would’ve required spinning up a whole new team. Instead, one team used AI tools to take it on alongside their existing work. We delivered a major competitive feature without hiring.

That’s not productivity improvement. That’s strategic optionality.

Challenging the “Killing Investments” Frame

Here’s my contrarian take: Maybe the problem isn’t that CFOs are “killing” AI investments. Maybe we proposed bad hypotheses.

If the hypothesis was “AI will reduce engineering headcount by 20%,” then yeah, kill that investment. The data doesn’t support it.

But if the hypothesis is “AI will let us punch above our weight class and tackle problems that would otherwise require 2x our team,” then the ROI conversation looks completely different.

A Framework That Works

I’ve started framing AI ROI like this:

ROI = (Quality × Speed × Scope) - (Cost + Risk)

Traditional productivity measures only look at Speed and Cost. But AI’s biggest impact is on Scope—the size and complexity of problems we can tackle.

Example: We’re a 80-person eng org competing against companies with 200+ engineers. AI tools don’t make us 2.5x faster. They make challenges that would’ve required 200 people achievable with 80.

How do you quantify that? You can’t, directly. But you can show:

  • Market share gains in features competitors can’t match
  • Customer retention because we ship quality faster
  • Successful strategic bets that would’ve been off the table

The Real Question

Are we treating AI like headcount reduction or capability expansion?

If it’s headcount reduction, traditional productivity metrics work fine. And yes, kill those investments if the data doesn’t support them.

If it’s capability expansion, we need different measures: Strategic initiative capacity, competitive feature parity, time-to-market for complex capabilities, team ability to take on stretch goals.

What hypotheses did you use when proposing AI investments? That might explain why CFOs are struggling to see ROI—we might have promised the wrong outcomes.

This conversation is fascinating from the design side, and I think there’s an aspect of AI impact that’s getting overlooked: collaboration quality.

What I’m Seeing From the Design Systems Perspective

I work with engineers every day on our component library and design system implementation. Since our eng team started using AI coding assistants about 6 months ago, something unexpected happened:

Engineers got better at explaining technical tradeoffs to non-engineers.

Not faster. Not more productive in the traditional sense. Better at collaboration.

The Qualitative Shift Nobody’s Measuring

Here’s what changed:

  • Engineers using AI are more willing to iterate on design feedback (because implementation feels less costly)
  • Technical discussions with product and design are higher quality (AI helps them explore options before meetings)
  • Cross-functional decisions happen faster (less “let me go figure out if that’s possible” and more real-time problem solving)
  • Documentation quality improved dramatically (AI helps write better explanations of technical decisions)

None of this shows up in velocity metrics. But it’s hugely valuable.

Questioning the Productivity Frame

David’s point about “competitive positioning vs cost reduction” resonates. But I’d go further:

What if better collaboration is more valuable than faster coding?

I’ve started tracking “cross-functional cycle time” instead of pure engineering velocity:

  • Time from design concept → implementation → feedback → iteration
  • Quality of technical input in early product discussions
  • Designer satisfaction with eng collaboration
  • Reduction in design-eng ping-pong iterations

These metrics are messier. But they capture what’s actually changing.

The Metric That Doesn’t Exist Yet

How do we quantify:

  • “Engineers making better decisions”
  • “Less technical debt because teams explored more options upfront”
  • “Product-eng-design alignment improving”
  • “Junior engineers learning faster from AI pair programming”

Pure cost metrics miss all of this. But these are the things that compound over time into real competitive advantages.

My Worry

If we only measure what’s easy to quantify (cost per feature, velocity), we’ll optimize for the wrong outcomes and kill investments in things that create genuine strategic value.

Keisha’s “capability expansion” frame is exactly right. But I’d add: Collaboration quality expansion might be just as important.

Has anyone found ways to measure the qualitative improvements in how teams work together? Or are we just stuck making the case with anecdotes?