AI Writes Code in Seconds. Your Team Reviews It for Hours. The Math Isn't Working.
The ROI pitch for AI coding tools is irresistible on paper: developers complete tasks 55% faster in controlled experiments, ship 98% more pull requests, and report saving 3.6 hours per week. But when organizations look at their actual delivery metrics — bug rates, release cycle times, incident frequency — the numbers barely move. Something is absorbing all those gained hours, and it's not hard to find.
AI generates code in seconds. Engineers still review it at the same pace they always have.
That asymmetry — fast generation, slow verification — is the hidden tax on every AI coding productivity claim. The teams that understand this are rebuilding their workflows. The ones that don't are buying faster treadmills and wondering why they aren't getting anywhere.
The 91% Review Time Explosion
When engineers adopt AI coding tools at scale, pull request volume increases dramatically. Cursor's own study found developers merged 39% more PRs after adopting AI agents. GitHub Copilot data shows similar patterns. More code generated means more code to review.
But reading comprehension hasn't gotten 39% faster. Engineers still need to understand context, build mental models of changed systems, consider edge cases, and evaluate security implications. The result is measurable: PR review time has increased 91% in teams with high AI adoption (70%+ AI-generated code), while bug rates climbed 9% compared to teams with lower adoption.
The 2025 DORA report captures the paradox cleanly: AI coding assistants drive a 21% increase in tasks completed and a 98% increase in PRs merged, but organizational delivery metrics — the things that actually matter to shipping software — stay flat. All those additional PRs are flowing into a review queue that hasn't scaled.
This is what makes the ROI calculation so misleading. Measuring "time to write code" captures the easy part. Measuring "total cost to ship correct, secure, maintainable code" tells a different story.
Why AI-Generated Code Is Harder to Review
It's not just volume. AI-generated code has structural properties that make review slower and riskier than reviewing the same amount of human-written code.
CodeRabbit's analysis found that AI-coauthored pull requests contain 1.7x more issues than human-written code. SonarSource found that 45% of AI-generated code contains OWASP Top 10 security vulnerabilities — 2.74x the rate for human code. Logic and correctness issues are 75% more common; error handling gaps are nearly 2x more common; readability issues are 3x more common.
This matters for review economics because problematic code requires more time to review, not less. Reviewers slow down when something looks off but they can't immediately articulate why. They need to trace more code paths, write more test cases to build confidence, and spend more cycles on back-and-forth in code comments. A PR that takes 20 minutes to review when written by a colleague might take 40 minutes when written by an AI — and reviewers often don't realize this is happening.
The METR randomized controlled trial result, often cited to dismiss AI productivity gains, makes more sense in this light: experienced developers were 19% slower when using AI assistance compared to when they worked without it. The generation speed gains were there. The verification overhead consumed them entirely.
The Task Taxonomy: Where the Math Works and Where It Doesn't
Not all code is equally expensive to verify. The tasks where AI clearly wins on productivity economics share a common property: the correctness criteria are narrow and easy to confirm.
Low verification overhead:
- Boilerplate and scaffolding code (test files, API stubs, config files)
- Documentation and docstring generation
- Dependency updates with clear changelogs
- Code migration following established patterns (e.g., upgrading a library version where the migration guide is unambiguous)
- Simple, isolated bug fixes with a reproduction case
High verification overhead:
- Business logic with non-obvious edge cases
- Security-critical paths: authentication, authorization, data handling
- Architectural changes that affect multiple subsystems
- API contract modifications (breaking vs. non-breaking is often unclear)
- Data model evolution in production systems
- Any code where "looks right" and "is right" are far apart
- https://www.buildwithdc.co/posts/the-new-asymmetry-when-generation-outpaces-verification-in-ai-native-development/
- https://www.sonarsource.com/company/press-releases/sonar-data-reveals-critical-verification-gap-in-ai-coding/
- https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- https://www.faros.ai/blog/ai-software-engineering
- https://dora.dev/dora-report-2025/
- https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
- https://www.sonarsource.com/resources/library/ai-code-generation-benefits-risks/
- https://testkube.io/blog/building-trust-in-ai-generated-code-through-continuous-testing
- https://www.qodo.ai/blog/building-the-verification-layer-how-implementing-code-standards-unlock-ai-code-at-scale/
- https://engineering.atspotify.com/2024/12/building-confidence-a-case-study-in-how-to-create-confidence-scores-for-genai-applications/
- https://survey.stackoverflow.co/2025/ai
- https://arxiv.org/html/2511.04427v2
