My team’s velocity metrics show we’re generating code 3x faster than we were 18 months ago. Our AI coding assistants are humming along, autocompleting functions, scaffolding entire features, even writing tests. But here’s the thing that keeps me up at night: our actual delivery speed only improved by about 15%.
Where did the other 85% go?
The Verification Bottleneck
After digging into our workflow data, the answer became painfully clear: we’re drowning in verification work. Code review queues are longer than ever. Our QA team is constantly catching subtle bugs that wouldn’t have existed if a human wrote the code in the first place. And most telling—our engineers are spending more time reading and validating AI-generated code than they used to spend writing it themselves.
Recent data from Sonar’s 2026 State of Code survey validates what we’re seeing: 96% of developers don’t fully trust the functional accuracy of AI-generated code. Yet here’s the paradox—only 48% say they always check code generated with AI assistance before committing it. We have a massive trust gap, and it’s showing up as increased bug rates, longer review cycles, and frankly, a lot of anxiety among the team.
Even more striking: 38% of developers report that reviewing AI-generated code requires more effort than reviewing human-generated code. Think about that. We built tools to make us faster, but we’re spending more time on verification than we saved on generation.
The Infrastructure Investment Question
This reminded me of a conversation I had with my VP when I was at Intel in the early 2010s. We were debating whether to invest heavily in CI/CD infrastructure. The argument was straightforward: if we’re serious about shipping fast, we need infrastructure that supports fast, safe deployments. Not just scripts—actual infrastructure. Build pipelines, automated testing, deployment automation, rollback mechanisms, observability.
We made that investment, and it paid off massively.
Now I’m looking at our current situation and asking: why aren’t we treating verification infrastructure with the same seriousness?
We have great tools for writing code (Copilot, Cursor, Claude Code, you name it). But our verification tooling feels stuck in 2015. We’re using the same code review processes, the same testing frameworks, the same manual QA cycles. We haven’t scaled our verification capabilities to match our generation speed.
What “Verification Infrastructure” Could Look Like
I’m still figuring this out, but here’s what I’m exploring with my team:
Expanded automated testing: Not just unit tests, but property-based testing, mutation testing, visual regression tests, contract testing. If AI can generate code quickly, can we generate comprehensive test suites just as quickly?
AI-powered review assistants: Tools that specifically look for common AI-generated code issues—overly verbose patterns, subtle logical errors, security vulnerabilities that slip through because the training data included bad practices.
Verification environments: Lightweight staging environments where AI-generated changes can be tested in isolation before code review even starts. Let the machines verify the machines first.
Observability from day one: If we’re less certain about correctness up front, we need better runtime verification. That means more logging, better monitoring, and automated anomaly detection that catches issues in production faster.
Process changes: Maybe verification shouldn’t happen at code review time. Maybe it happens continuously as the AI writes code. Maybe we need pair programming where one person generates and the other verifies in real-time.
The Real Question
Here’s what I’m grappling with: Is the verification bottleneck a tooling problem, a process problem, or a skills problem?
Are we missing the right tools to verify AI-generated code efficiently? Do we need to redesign our development workflow entirely? Or do we need to train our engineers in a fundamentally new skill—not writing code, but verifying it at scale?
I suspect it’s all three. But I also suspect that teams who figure this out first will have a massive competitive advantage. Just like teams who invested early in CI/CD infrastructure pulled ahead in the 2010s.
For those of you dealing with this: What’s working? What verification investments have you made? Where are you seeing the biggest ROI?
I’m especially curious to hear from teams that have actually solved this, not just teams (like mine) still figuring it out.
Some useful reading on this topic: