At our last exec meeting, our VP Engineering presented a slide that stopped the conversation cold: “25% of our codebase now AI-assisted, velocity up 10%.” The CFO asked the obvious question I was thinking: “If a quarter of our code writes itself, why isn’t productivity up 25%?”
Google just reported the same pattern. Sundar Pichai announced 25% of their new code is AI-generated, with engineering velocity gains around 10%. The math doesn’t work—and I think we’re all missing something fundamental about what we’re actually measuring.
The Two Hypotheses
I see two possible explanations, and I’m not sure which worries me more:
Hypothesis 1: AI code carries a quality tax. Maybe AI-generated code requires disproportionate review time. GitHub reports a 46% code completion rate from Copilot, but only ~30% of those suggestions get accepted. That’s a lot of cognitive overhead—reading, evaluating, rejecting AI proposals. When I watch our senior engineers work with AI assistants, they’re spending more time reviewing and refactoring AI suggestions than they saved by not typing.
Hypothesis 2: We’re measuring the wrong outputs entirely. Velocity traditionally means “code shipped” or “PRs merged” or “story points completed.” But none of those correlate with business value. If AI helps us ship features 10% faster, but those features take 3x longer to maintain, did we actually gain anything? The research backs this up—AI-authored code is now 26.9% of production code globally, but organizational productivity gains are stuck at 8-12%.
What I’m Seeing at Scale
I’ve led engineering teams through two major technology shifts—cloud migration at Microsoft and microservices at Twilio. This AI adoption wave feels different. In previous shifts, productivity dipped before it improved (learning curve, migration costs). With AI, individual developers report 25-39% productivity gains, but our DORA metrics barely moved.
The bottleneck shifted. Our PR queue is 40% larger than last quarter. Code review cycle time increased from 1.2 days to 2.1 days. QA is drowning. Security keeps flagging issues in AI-generated auth code. We optimized coding, but coding was never the constraint.
The Measurement Question
Here’s what keeps me up at night: What if velocity was always the wrong metric, and AI is just making that obvious?
Maybe we should measure:
- Time from “idea proposed” to “customer value delivered” (not code committed)
- Incident reduction and system reliability improvements
- Technical debt accumulation rate
- Developer cognitive load and context-switching overhead
- Customer outcome metrics tied to engineering work
But most engineering dashboards still show commits, PRs, and lines changed. We’re measuring industrial-era outputs in a knowledge-work world.
The Real Question
When we say “AI increased productivity 10%,” what did we actually measure? Code written? Tasks completed? Customer problems solved? Revenue enabled?
I’d love to hear from other engineering leaders: How are you measuring AI’s impact? Are you seeing the same adoption-velocity gap? And more importantly—what should we be measuring instead?
Because right now, we’re celebrating a 25% input increase while scratching our heads about a 10% output increase. That gap isn’t a mystery—it’s a message. We’re just not sure what it’s telling us yet.