Three months ago, I walked into our Monday leadership meeting with what I thought was great news. “Engineering productivity is up 40%,” I announced. “We’re merging 60% more pull requests than last quarter.”
Our CEO leaned forward. “That’s fantastic. So when do customers get the new features we promised?”
I looked at my VP Product. He looked at me. Neither of us had a good answer.
The timeline hadn’t changed at all.
The Productivity Paradox We’re All Living
Here’s what the data actually shows across the industry right now:
- Developers save an average of 3.6 hours per week using AI coding tools (DX’s analysis of 135,000+ developers)
- Daily AI users merge approximately 60% more pull requests than developers not using AI
- Yet companies consistently report flat delivery velocity despite these individual gains
At my EdTech startup, we’re living this paradox. Our engineers feel more productive—they’re writing code faster, churning through tickets, celebrating green checkmarks in Jira. But our features still take the same amount of time to reach production.
Where did the gains disappear to?
The Bottleneck Moved, We Just Didn’t Notice
Recent research from Agoda nailed it: “Coding was never the bottleneck.”
The constraint shifted upstream:
- Specification: What should we actually build? (Still requires human judgment)
- Verification: Is this correct and safe? (Still requires senior review)
- Review time ballooned by ~91% in high-AI-adoption teams (Faros AI data)
In practical terms: My junior engineers write 5 PRs per day with AI assistance. My senior engineers now spend 80% of their time reviewing AI-generated code—hunting for subtle semantic bugs, architectural anti-patterns, and security vulnerabilities that AI introduces.
The seniors can’t write code anymore because they’re drowning in review queues.
We optimized one bottleneck and created another.
The Quality Tax We’re Not Talking About
Here’s the part that keeps me up at night:
- AI-coauthored PRs show 1.7× more issues than human-only PRs
- The DORA 2025 report found AI correlates with increased instability—teams merge faster but break production more often
- We’re shipping more code, but are we shipping better products?
One of my leads put it this way: “AI makes it cheaper to build the wrong thing really fast.”
Gartner Says We’re Measuring the Wrong Things Entirely
The 2026 Gartner research is challenging everything I thought I knew about engineering effectiveness.
Platform teams struggle to communicate ROI because they’re speaking in the wrong language:
- Traditional metrics: Deployments per day, PR count, velocity points
- Business metrics that matter: Revenue enabled, costs avoided, profit contribution
And here’s the prediction that hit me hardest: Creativity and innovation—not velocity or deployment frequency—will define effectiveness in 2026.
Not “how fast did you ship?”
But “did you build something customers actually value?”
The Questions I’m Wrestling With
I’m re-evaluating everything about how we measure engineering effectiveness:
-
Are deployment frequency and PR velocity now misleading vanity metrics? If we’re shipping more but delivering the same value, what are we really measuring?
-
How do you measure creativity and innovation? These are the outcomes that matter, but they’re fuzzy and hard to quantify. What’s the operational metric for “we built the right thing”?
-
Should we shift from “velocity” to “value delivered”? And if so, how do we define and measure value in a way that engineers can optimize for?
-
Is the solution more AI (to review AI code) or better human processes? Are we about to get trapped in an AI-reviewing-AI infinite loop?
What I’m Trying Next
At my company, we’re experimenting with new metrics:
- Customer problem resolution rate (not features shipped)
- Time from insight to customer value (not time to deploy)
- Innovation experiments validated (not story points completed)
- Technical debt reduction (not just feature velocity)
It’s messy and imperfect, but at least we’re asking the right questions.
I’d love to hear from this community:
For engineering leaders: What metrics are you using to measure effectiveness beyond throughput?
For product folks: How are you helping engineering teams understand “value” vs. “output”?
For CTOs: How are you communicating technical productivity to the board when velocity metrics no longer tell the story?
Did we spend the last five years optimizing for the wrong things? And if so, what should we be optimizing for instead?
Keisha Johnson | VP Engineering @ EdTech Startup
Formerly Google, Slack | Building inclusive, high-performing teams