I need to share something that’s been controversial internally, and I’m curious how others would handle it.
The Context
Three months ago, we had our fourth production incident from AI-generated code. The pattern was always the same: developer uses AI to refactor something, accepts a large diff without thorough review, ships it, something breaks.
After the fourth incident (which took down our payment processing for 45 minutes), I made an unpopular decision: strict review gates for all AI-generated code.
The Changes We Made
- Delta size limits - AI-generated diffs over 100 lines require senior engineer review
- AI disclosure requirement - PRs must indicate which parts used AI assistance
- Mandatory review checklist - Specific items for AI code: verify imports exist, check error handling, test edge cases
- No direct commits - All AI-generated code goes through PR process, no exceptions
The Results (After 3 Months)
The bad news: Initial development velocity dropped ~30%. Engineers complained about “red tape.” Some felt we were micromanaging.
The good news:
- Production bugs from AI code: down 60%
- Time spent debugging: down 45%
- Customer-impacting incidents: down from 4/month to 0.5/month
- Engineer satisfaction (after initial dip): back to baseline
The Interesting Part
After about 6 weeks, something shifted. Engineers internalized the discipline. They started prompting AI differently—asking for smaller, focused changes instead of large rewrites. They reviewed code more carefully even when they weren’t required to.
The “slower” feeling started to disappear. Not because we relaxed the gates, but because the gates became habit.
The Trade-Off Question
Here’s what I’m grappling with: We absolutely took a short-term velocity hit for long-term quality. In a startup environment where “ship fast” is the culture, this was a hard sell.
Some engineers loved it (especially those who had been burned by debugging AI code). Others felt we were being too conservative, that we should “trust the process” and iterate.
My questions for this group:
-
When is this trade-off worth it? Are there contexts where you should optimize for speed over safety with AI tools?
-
How do you sell “slower now, faster later” to leadership? Especially when competitors are bragging about AI productivity gains?
-
What’s the right balance? Are 100-line deltas too conservative? Should different parts of the codebase have different rules?
The data says this was the right call, but I still get pushback. How do you know when to prioritize quality gates over velocity?