We’re shipping AI-generated code at 2x the speed—but are we securing it at half the pace?
I need to share something that’s been keeping me up at night. ![]()
Eight months ago, our design systems team started using GitHub Copilot. The velocity boost was instant—we went from shipping 2-3 components a week to 4-5. Our sprint velocity increased by 40%, and leadership was thrilled. I was thrilled. The AI wrote clean React code, handled edge cases I’d forget, and even suggested accessibility attributes.
Then we shipped three accessibility bugs to production. ![]()
The AI had suggested aria-label attributes that looked right but were semantically incorrect. Screen readers were getting confusing information. Our own QA didn’t catch it because the code looked professional and passed automated tests. It took a customer complaint to surface the issue.
That moment shook me. If I—someone who teaches accessibility—could miss AI-generated bugs because “the code looked right,” what else are we all missing?
The Numbers Are Terrifying
I started digging into the research, and here’s what I found:
- 45% of AI-generated code contains security flaws (Veracode analysis across 100+ LLMs)
- At least 35 new CVEs in March 2026 alone traced directly to AI-generated code—up from just 6 in January
- The real estimate? 400-700 CVEs across the open-source ecosystem (researchers think detected cases are only 10-20% of reality)
- 70% of security teams report confirmed AI-generated vulnerabilities already in production
And here’s the kicker: Less than 50% of developers review AI-generated code before committing it.
We’re not just shipping faster. We’re shipping blindly faster.
The “Vibe Coding” Problem
There’s this new term I learned: “vibe coding.” It’s when developers trust AI-generated code based on vibes—it feels right, it looks professional, so we merge it.
I’m 100% guilty of this. When Copilot suggests a whole function that handles exactly what I need, complete with error handling and type safety, my brain goes: “This is better than what I’d write myself” and I just… accept it.
The speed advantage creates review pressure. When you can generate 10 components in the time it used to take to write 5, the backlog of “code to review” grows exponentially. We went from struggling to ship fast enough to struggling to review carefully enough.
Our code review queue went from 8 PRs to 23 PRs in two months. The bottleneck shifted from writing to reviewing, and we didn’t adjust our process to match.
What We Changed (And What We’re Still Figuring Out)
After the accessibility incident, we implemented some changes:
AI-generated PRs get a special label - forces conscious review
Mandatory accessibility audit for any component touching user interaction
Slower velocity targets - we accepted that 40% faster shipping wasn’t sustainable if quality suffered
But I’m still wrestling with open questions:
Are we trading speed for security debt? Our sprint velocity is still up 25%, but how much technical/security debt are we accumulating?
What governance actually works at the team level? Policies are great, but what tactically prevents vibe coding?
How do you balance velocity with validation? Rejecting AI help entirely feels like refusing electricity, but accepting it blindly is clearly dangerous.
The Trust Problem
Here’s what really scares me: I work in design systems—the foundation layer that dozens of product teams build on. If I ship a vulnerable component to our shared library, I’m not just risking one feature. I’m potentially compromising every team that consumes that component.
Users don’t care if AI wrote the bug. They blame us. And they’re right to.
From my startup failure days, I learned that security breaches destroy user trust overnight. You can’t design your way out of broken security. You can’t smooth over “your app leaked my data” with better UX. ![]()
So… What Are We Doing About This?
I’m genuinely curious how other teams are handling this. Are you:
- Flagging AI-generated code for extra review?
- Using static analysis tools specifically for AI code patterns?
- Measuring “net productivity” including security debt?
- Training teams on what AI code smells to watch for?
Because right now, it feels like we’re in that awkward phase where AI coding tools are mainstream enough that everyone’s using them, but governance practices haven’t caught up yet.
I don’t have answers. But I think we need to start talking about this now, before the 400-700 CVE estimate becomes 4,000-7,000.
What’s your team doing? ![]()
![]()
Sources: