The data doesn’t add up, and I can’t stop thinking about it.
93% of developers now use AI coding tools. 41% of all code is AI-generated. But organizational productivity gains? They plateaued at around 10% after an initial spike.
For context: I’m a VP of Product at a Series B fintech startup. If this were a B2B SaaS product—93% adoption, 41% usage intensity, but only 10% measurable value—I’d be calling an emergency product strategy meeting. Something is fundamentally broken in the value chain.
Three Hypotheses for the Plateau
I’ve been wrestling with three possible explanations:
Hypothesis 1: We Hit a Real Ceiling
Maybe AI coding tools are genuinely good at autocomplete and boilerplate but struggle with complex reasoning. The 10% represents the actual value boundary—we’ve extracted all the low-hanging fruit (syntax help, common patterns), and the hard problems (architecture, novel algorithms, business logic) remain human work.
Some evidence: The METR study where experienced developers believed they were 24% faster with AI but were actually 19% slower in controlled tests. That perception gap is wild.
Hypothesis 2: We’re Measuring Wrong
Traditional productivity metrics—lines of code, PRs merged, velocity points—were designed for a pre-AI world. They don’t capture:
- Quality improvements (fewer bugs, better test coverage)
- Cognitive load reduction (less context switching, fewer Stack Overflow tabs)
- Maintainability gains (clearer code because you spent less time fighting syntax)
- Learning acceleration (junior engineers accessing patterns they wouldn’t find otherwise)
CircleCI reports 59% throughput increase for teams using AI, but most organizations are “leaving gains on the table” because their systems haven’t caught up. Maybe the 10% plateau is a measurement artifact?
Hypothesis 3: Organizational Systems Lag
AI tools optimized the individual developer. But the surrounding ecosystem—CI/CD pipelines, code review processes, deployment workflows, team collaboration patterns—are all still designed for pre-AI productivity levels.
If developers generate 40% more PRs but review capacity stays flat, you’ve just moved the bottleneck. The system is only as fast as its slowest component.
The Product Lens
Here’s what bothers me from a product strategy perspective:
When we launch a new feature and see 90%+ adoption but <15% value capture, we dig into the “activation gap”—what’s preventing users from extracting the full value?
For AI coding tools, the activation gap is enormous. Developers are using the tools (93%!), generating tons of output (41% of code!), but not seeing proportional productivity gains.
That pattern usually indicates one of three things:
- The product solves the wrong problem
- The product creates downstream friction
- Users lack the skills/context to leverage it fully
I suspect it’s a combination of #2 and #3 for AI coding tools.
What Should We Actually Measure?
If “time to first draft” is the wrong metric (spoiler: it probably is), what should we track instead?
Some ideas:
- Time to confident deployment: Not “how fast can you write code” but “how fast can you ship validated, tested, production-ready code”
- Cognitive load reduction: Are developers spending less mental energy on syntax and more on architecture?
- Error recovery time: When AI gives you wrong code, how long does it take to fix vs. writing from scratch?
- Iteration velocity: How fast can you go from idea → prototype → refined version → shipped feature?
The challenge: These are harder to instrument than “commits per week.”
But if we’re serious about understanding AI’s impact, we need to evolve our measurement frameworks.
Questions for This Community
I’m particularly curious to hear from engineering leaders who’ve moved beyond the 10% plateau:
- What changed? Was it measurement, systems, culture, or something else?
- What are you actually tracking? Beyond DORA metrics, what KPIs reveal AI tool effectiveness?
- How do you handle the perception gap? If engineers feel faster but data shows they’re not, do you correct the perception or let it be?
And the meta-question: Are we optimizing for the wrong outcome?
What if 10% productivity + 50% job satisfaction improvement is better than 30% productivity + 20% burnout increase?
What if “time saved” is the wrong goal, and “quality of thinking” is the right one?
I don’t have answers, but I’m convinced we’re asking the wrong questions.
Looking forward to your thoughts.
Sources for data cited: