AI Writes Fast, Humans Fix Slow: The 1.7× Issue Rate No One's Tracking

maya_builds · March 18, 2026, 7:50am

Two weeks ago, I shipped a seemingly simple PR: a modal component for our design system. Built it in an afternoon using Cursor. Felt like magic.

Then QA tested it.

Twelve accessibility violations. Twelve.

It took three days to fix what took three hours to generate.

And here’s the thing - the AI didn’t write bad code. It wrote code that worked perfectly… if you only use a mouse. If you only have perfect vision. If you’ve never heard of WCAG compliance.

The code compiled. The tests passed. The demo looked beautiful.

It just wasn’t… right.

The Quality Gap in AI-Generated Code

There’s a stat that’s been haunting me since I saw it last week:

Pull requests with AI-generated code have 1.7× more issues than human-written code (source).

Not “slightly more” issues. Not “a few edge cases.” 70% more problems.

And we’re not tracking it separately. We’re lumping AI code and human code into the same metrics, wondering why our overall quality is declining.

What AI Doesn’t Understand: The “Why” Behind the Code

From a design systems perspective, I’m seeing a pattern:

AI optimizes for immediate functionality, not long-term maintainability.

Examples from our codebase:

Design Tokens

Human approach: Uses spacing-md, color-primary, font-size-body
AI approach: Hardcodes 16px, #2563eb, 14px everywhere
Impact: Works perfectly until we rebrand or need dark mode

Component Composition

Human approach: Builds modular components that compose predictably
AI approach: Creates monolithic components that “do everything”
Impact: Works for the immediate use case, nightmare to maintain

Accessibility

Human approach: Keyboard navigation, ARIA labels, focus management from the start
AI approach: Mouse-first, visual-first, “we’ll add accessibility later”
Impact: Later never comes, or costs 3× to retrofit

The 41% Bug Increase Nobody Talks About

@eng_director_luis mentioned this stat in the other thread: projects with high AI code usage saw a 41% increase in bugs (source).

In our design system, I can trace where these bugs come from:

1. Context Blindness

AI doesn’t know our component hierarchy
AI doesn’t understand our design token system
AI doesn’t see the dependencies between components
AI generates code that works in isolation but breaks in integration

2. Pattern Inconsistency

AI learns from Stack Overflow, not our style guide
AI mixes different state management patterns in the same component
AI uses whatever works, not what’s maintainable

3. Edge Case Ignorance

AI handles the happy path beautifully
AI forgets error states, loading states, empty states
AI doesn’t think about mobile, tablets, screen readers, slow networks

The Review Burden is Real

Here’s the uncomfortable truth: AI-generated code creates more work for reviewers, not less.

When I review human-written code:

20 minutes to understand the approach
10 minutes to check for issues
5 minutes to suggest improvements
~35 minutes total

When I review AI-generated code:

5 minutes to understand (it’s usually straightforward)
40 minutes to check for ALL the things AI might have missed:
- Accessibility
- Design token usage
- Component composition
- Edge cases
- Mobile responsiveness
- Performance implications
- Maintainability
15 minutes to document what needs to be fixed
~60 minutes total

Our AI-assisted PRs take 2.3× longer to review than human-written PRs.

Where’s the productivity gain?

My Controversial Proposal: Separate Review Queues

What if we treated AI-heavy PRs as a different category?

Not to shame them. To acknowledge they need different review patterns.

Standard PR review checklist:

Functional correctness
Test coverage
Follows architectural patterns
Code quality

AI-assisted PR review checklist (additional checks):

Uses design tokens (not hardcoded values)
Accessibility compliance (keyboard nav, ARIA, focus)
Edge cases (error states, loading, empty)
Mobile/responsive (not just desktop)
Integrates with existing patterns (doesn’t reinvent)
Performance (AI loves nested loops and unnecessary re-renders)
Maintainability (will someone understand this in 6 months?)

Different tools. Different SLAs. Different expectations.

The Question: Should We Track AI Code Quality Separately?

@cto_michelle proposed tracking AI code as a distinct quality category. I think we need to go further:

Separate metrics for:

AI code review time vs. human code review time
AI code bug density vs. human code bug density
AI code rework rate (how often is it substantially refactored within 30 days?)
AI code accessibility score vs. human code accessibility score

Not to stigmatize AI code. To understand its actual cost and value.

Right now we’re celebrating “41% of code is AI-generated!” without asking:

How much of that 41% shipped to production?
How much of it had to be significantly reworked?
How much reviewer time did it consume?
How much production debt did it create?

What’s Actually Working

After our modal accessibility disaster, we changed our process:

Before AI generates code:

I create the component spec in Figma
I document accessibility requirements explicitly
I list which design tokens must be used
I map out component relationships
Then I let AI generate the implementation

Result: AI-generated code that actually integrates with our system.

Still faster than writing by hand. Way slower than letting AI run wild.

But the code that ships is actually maintainable.

The Uncomfortable Question for Everyone

Are we accepting “good enough” more often because AI wrote it?

When humans write code, we scrutinize every decision. When AI writes code, we just check if it works.

That’s a mistake.

AI makes it easy to generate lots of code fast. That doesn’t mean the code is good. And it definitely doesn’t mean we should review it less carefully.

If anything, we should review it more carefully - because the AI doesn’t understand our system, our constraints, our users, or our future maintenance burden.

What are others seeing? Is the 1.7× issue rate real in your codebases? How are you handling the review burden?

eng_director_luis · March 18, 2026, 7:51am

Maya, your accessibility example hits hard because we’re seeing the exact same pattern in a totally different domain: financial compliance code.

The Compliance Blind Spot

Last month, one of our engineers used AI to generate code for processing wire transfers. The AI produced beautiful, clean, well-tested code that handled:

Input validation
Error handling
Database transactions
API response formatting

What it didn’t handle:

OFAC sanctions screening
Anti-money laundering (AML) reporting thresholds
Reg E compliance for error resolution
Audit trail requirements

The AI doesn’t know that wire transfers over $10,000 trigger specific reporting requirements. It doesn’t know that certain countries require additional screening. It doesn’t know that we need immutable audit logs for regulatory examiners.

The code worked perfectly. It was also a compliance violation waiting to happen.

We caught it in review - but only because our senior engineer has 12 years of fintech experience and knows to look for this stuff.

The Risk with Specialized Domains

AI is trained on general programming knowledge. It’s not trained on:

Your industry’s regulatory requirements
Your company’s compliance obligations
Your internal security standards
Your architectural guard rails

In financial services, that gap can cost millions in fines.

In healthcare, it could violate HIPAA.

In government contracting, it could break security clearances.

The more specialized your domain, the more dangerous AI-generated code becomes without expert review.

Our Review Time Stats

You mentioned 2.3× longer review time for AI code. Our numbers are similar:

Human-written code (financial domain):

~45 minutes for complex feature review
Senior engineers know what to look for
Clear patterns established over years

AI-generated code (financial domain):

~90-120 minutes for same complexity
Must verify every compliance requirement manually
Must check for security patterns AI doesn’t understand
Must validate against regulatory rules AI can’t know

We’re not reviewing faster. We’re reviewing way more carefully because the stakes are higher.

What We’re Testing: Required Human Design Phase

Your “spec before AI” approach resonates. We implemented something similar:

For any financial transaction code:

Senior engineer writes a technical design doc that includes:
- Regulatory requirements (with citations)
- Security requirements (with threat model)
- Audit trail requirements
- Rollback and error handling procedures
Design doc gets reviewed by compliance team
Only then can AI generate implementation
Implementation gets second review against the design doc

It’s slower. Way slower than letting AI run wild.

But we’ve had zero compliance issues in AI-generated code since we started this process.

Before? We caught three potential violations in two months.

The Question of Trust

“Are we accepting ‘good enough’ more often because AI wrote it?”

Yes. And it scares me.

When a junior engineer writes financial code, I ask hard questions:

“Why did you choose this approach?”
“What happens if this external API is down?”
“How are you handling the reconciliation edge case?”

When AI writes the same code, I find myself asking softer questions:

“Does it compile?”
“Did the tests pass?”
“Is the logic correct?”

I’m not probing for understanding the same way. And that’s dangerous.

Because in six months, when we need to debug a production issue at 2am, the engineer who “wrote” the code with AI can’t explain why it was built that way.

The Mentoring Problem

This connects to what @vp_eng_keisha mentioned about junior engineers not learning.

I have a junior engineer who can ship features incredibly fast with AI. But in code review, when I ask “Why did you structure it this way?”, the answer is often:

“I don’t know, that’s what Copilot suggested.”

That’s not learning. That’s copying.

In financial services, we can’t have engineers who can’t explain their own code. The regulators won’t accept “the AI did it.”

Maya, I’d love to hear more about your pre-AI specification process. Are you seeing better integration results? And how do you balance the “slower but better” trade-off when PMs are pushing for speed?

cto_michelle · March 18, 2026, 7:52am

The 1.7× issue rate and the 41% bug increase - these numbers are exactly why our AI investment business case is falling apart.

The Math That Doesn’t Math

Our board approved $2M in AI tooling budget last year based on this logic:

AI makes developers 20-30% more productive
Same output with 20-30% fewer developers
ROI: Save $5M in avoided hiring

Six months later, here’s the reality:

Code output is up 25%
Developers needed: Same number
Bug rates: Up 23.5%
Review time: Up 91%
Deployment incidents: Up 30%

The productivity gains evaporated in quality issues.

The Hidden Cost of “Fast” Code

Maya, your accessibility example is the perfect illustration. Let me translate it to CFO language:

Time to generate code with AI: 3 hours
Cost: $150 (loaded cost for mid-level engineer)

Time to fix the code: 3 days
Cost: $3,600 (loaded cost for senior engineer + QA + design review)

Net productivity: -2.75 days, -$3,450

That’s a 24× cost multiplier for the “fast” AI code.

Now multiply that across every AI-generated PR that has to be substantially reworked. The “productivity gains” become “productivity debt.”

Strategic Concern: This Undermines AI Adoption

Here’s what worries me as a CTO:

If developers experience AI as “fast code that creates slow problems,” they’ll resist adoption of AI where it could actually help - testing, documentation, ops automation.

We’re poisoning the well by over-indexing on code generation speed without accounting for integration cost.

The Proposal: AI Quality Score

I’m proposing to our engineering leadership that we track an “AI Quality Score” as part of our code review metrics.

For each AI-assisted PR, track:

Initial review time - How long to review?
Rework cycles - How many rounds of fixes?
Bug escape rate - Did issues reach production?
Architectural fit - Does it follow our patterns?
Maintainability score - Will someone understand it in 6 months?

Aggregate these into a score: 0-100 points

If score < 70: Flag for deeper architectural review
If score < 50: Consider human rewrite

This gives us data to answer:

Which types of code does AI handle well? (Maybe it’s great at CRUD endpoints but terrible at complex state management)
Which engineers use AI effectively? (Maybe they’re better at prompting or better at reviewing AI output)
Where should we invest in AI education?

The Organizational Investment Required

If PR review time is up 91% and AI code has 1.7× more issues, we need to invest in review infrastructure:

Option 1: Dedicated AI Code Review Specialists

Senior engineers who specialize in reviewing AI-generated code
They know what to look for, they’re fast at it
Expensive, but maybe cheaper than having all senior engineers slow down

Option 2: Automated AI Code Quality Gates

Static analysis that checks for AI-common mistakes
Accessibility scanners that catch what AI misses
Architectural linters that enforce our patterns
Catch issues before human review

Option 3: Both

Automation catches the obvious stuff
Specialists handle the nuanced review
Regular engineers get trained over time

None of these are free. But they might be necessary if we want to actually realize AI productivity gains.

The Question for the Room

Maya asked: “Should we track AI code quality separately?”

My answer: Yes, and we should tie AI tool budget to AI code quality metrics.

If AI code has 1.7× more issues and takes 2× longer to review, then the “productivity” claim needs to be re-evaluated.

Either:

We figure out how to use AI in ways that don’t create quality debt, OR
We acknowledge that AI’s value is in specific use cases (prototyping, scaffolding, documentation), not wholesale code generation

Right now we’re in a messy middle where we’re adopting AI everywhere without understanding where it actually helps vs. where it hurts.

The 41% bug increase is a symptom. The disease is adopting tools without adapting processes.

@maya_builds - your separate review queue proposal is smart. What if we piloted it for one team and measured the difference? I’d be curious if dedicated “AI code reviewers” could develop patterns that make review faster without sacrificing quality.

product_david · March 18, 2026, 7:52am

This thread is making me relive a very painful product lesson from three months ago.

The Customer Impact Story Nobody Wants to Tell

We shipped a new feature in January that our biggest enterprise customers had been asking for: bulk user import.

The engineering team knocked it out in record time. Two weeks from spec to production. Everyone was celebrating the velocity.

The code? About 60% AI-generated. The engineer used Copilot heavily and it showed - clean, well-structured, tested.

Week 1 in production: Three customers reported data inconsistencies after bulk imports.

Week 2: Data loss incident. A customer imported 10,000 users. 847 of them lost associated metadata due to a race condition.

Week 3: We had to write a data recovery tool, contact every customer who used the feature, and issue credits.

Cost: $180K in credits, two weeks of engineering time to fix, immeasurable reputation damage.

Root cause: AI-generated code didn’t handle concurrent database writes correctly. The tests passed because we tested with small datasets. The code review didn’t catch it because the logic looked sound.

The code worked. Until it didn’t.

The PM Question: Speed vs. Safety

Here’s my dilemma:

My job is to ship features that customers want, as fast as we can, without breaking things.

AI promises speed. But if that speed comes with a 41% increase in bugs and a 30% increase in failure rates (source), then we’re not actually shipping faster - we’re just shipping broken things faster.

That’s worse than shipping slowly.

The Framework I’m Now Using: AI for MVPs, Humans for Production

After the bulk import disaster, I proposed a new guideline to engineering:

Use AI for:

Prototypes and proof-of-concepts
Internal tools (where failure is low-impact)
Scaffolding and boilerplate
Test data generation
Documentation

Require human-led design for:

Production features with customer data
Payment processing
Security-critical paths
Performance-sensitive operations
Complex state management

It’s slower. But it acknowledges the reality: AI is great at generating code, not great at understanding consequences.

The Customer Perspective Maya’s Missing

Maya, your accessibility example is perfect, but there’s another layer:

Our customers don’t care that AI wrote the code.

They care that:

The feature doesn’t work for keyboard users
They have to explain to their legal team why our app isn’t WCAG compliant
They might get sued because our software enabled discrimination against disabled users

When we say “AI-generated code has 1.7× more issues,” we’re talking about engineering metrics.

From a customer perspective, it’s:

1.7× more reasons to churn
1.7× more support tickets
1.7× more trust erosion

That’s the business impact we’re not measuring.

The Question for Engineering Leaders

When @cto_michelle proposes tracking “AI Quality Score,” I’d add: Track customer impact separately.

For AI-assisted features:

Support ticket volume (vs. human-written features)
Bug reports per user
Customer satisfaction scores
Feature adoption rates
Churn correlation

If AI code leads to worse customer outcomes, then the “productivity gains” are a lie.

We’re not productive if we’re shipping things customers don’t trust.

What I’m Taking Back to My Team

I’m going to start flagging “AI-heavy” features in our product roadmap and requiring:

Extended QA cycles - If AI wrote 60%+ of the code, double the testing time
Customer-facing beta - Let friendly customers test before general release
Rollback plans - Always have a way to turn it off quickly
Monitoring dashboards - Track feature health metrics in real-time

It’s acknowledging that AI code is higher risk until we prove otherwise.

Not anti-AI. Just pro-customer.

Maya, how do you balance the PM pressure to “ship fast” with the design systems reality of “AI code needs more review”? Do you push back on timelines, or do you let some quality slide?