Last quarter, our engineering team merged 98% more pull requests than the previous year. When our CEO asked during our board meeting when customers would actually see the impact of all this “productivity,” I had to explain something that didn’t make sense on the surface: despite shipping more code than ever, our release velocity to production had actually decreased.
This is the paradox of 2026 that nobody warned us about.
The Numbers Don’t Lie (But They Do Mislead)
According to CircleCI’s 2026 State of Software Delivery report, AI-assisted development drove a 59% increase in average engineering throughput. Sounds incredible, right? But here’s what the aggregate numbers hide: feature branch throughput increased 15%, while main branch throughput—where code actually gets promoted to production—fell by 7%.
We’re generating more code, merging more PRs, and closing more tickets. But we’re releasing less frequently to customers.
The Supervision Paradox
Here’s what I’ve learned the hard way: the faster AI generates code, the more human attention is required to ensure that code actually works in the context of a real system with real users and real business constraints.
The production bottleneck didn’t disappear—it moved from writing to understanding. And understanding is much harder to speed up.
At my company, PR review time increased 91% after widespread AI adoption. Our senior engineers now spend 60-70% of their time reviewing code instead of writing it. We can’t just rubber-stamp AI-generated code—we’ve found subtle bugs that would have been production incidents, edge cases that weren’t considered, architectural decisions that didn’t align with our patterns.
We’re Measuring the Wrong Things
This is a classic case of Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”
We’re tracking:
- Commit counts (vanity metric)
- PRs merged (activity, not outcome)
- Lines of code (actively harmful)
- Velocity points (disconnected from customer value)
We’re not tracking:
- Time from feature kickoff to customer hands
- Deployment frequency paired with failure rates
- Customer-impacting releases
- Actual business outcomes
Commit counts don’t tell you actual value delivered. They’re easy to game and completely disconnected from what customers care about.
The Validation Bottleneck Is Real
The data from Harness’s 2026 report is sobering: 47% of frequent AI tool users report that manual work—QA, remediation, validation—has become more problematic, not less. 69% experience deployment problems when AI-generated code is involved.
Our deployment systems were designed for a world where writing code was the bottleneck. They weren’t designed for a world where we have 3x the code volume flowing through our pipelines.
Our CI/CD infrastructure can handle maybe 10 deployments per day. AI wants to push 50. Our test suites take 45 minutes to run. Our deployment approval processes are still manual. Our observability tools weren’t scaled for this volume of changes.
The Human Cost
Here’s the statistic that keeps me up at night: 96% of very frequent AI coding users report being required to work evenings or weekends multiple times per month due to release-related work. Compare that to 66% of occasional users.
We’ve traded writing code during business hours for reviewing code, fixing broken deployments, and firefighting production issues on weekends. Developers report spending 36% of their time on repetitive manual tasks—chasing tickets, rerunning failed jobs, copy-paste configuration, human approvals.
This isn’t sustainable. We’re burning people out.
The Path Forward
The teams that are succeeding aren’t just measuring commits—they’re investing in the entire delivery lifecycle with the same urgency they invested in AI tools:
- Modernizing CI/CD infrastructure to match the new velocity
- Creating dedicated release engineering teams to own the deployment pipeline
- Implementing comprehensive automated testing at every layer
- Building observability into everything to catch issues earlier
- Establishing clear SLAs on review time so PRs don’t pile up indefinitely
- Shifting metrics from activity to outcomes—cycle time, deployment frequency, customer value delivered
Top performers treat validation as a first-class engineering investment, not an afterthought.
Questions for the Community
I’m curious about your experiences:
-
Has anyone successfully modernized their delivery pipeline to match AI coding velocity? What did you invest in? What worked and what didn’t?
-
What metrics are you actually tracking to measure customer value delivery instead of just engineering activity?
-
How are you balancing the speed benefits of AI with the increased supervision and validation it requires? Have you changed your team structure or processes?
-
What are you doing to prevent developer burnout when the new bottleneck is human review and validation?
The conversation has to shift from “whether to adopt AI” to “how to build systems that can actually deliver the value AI makes possible.” We’re generating more code than ever, but if it’s not reaching customers, we’re just creating expensive inventory.
What are you seeing in your organizations?