48% of AI-Generated Code Has Security Vulnerabilities - The Numbers Nobody Wants to Talk About

I’ve been tracking the security implications of AI-assisted coding for the past two years, and the data is now clear enough that we need to have an honest conversation about it.

The Hard Numbers

Veracode’s latest research shows that 45-48% of AI-generated code contains security vulnerabilities. But that’s just the headline. The details are worse:

  • 62% of AI-generated solutions contain design flaws or known security vulnerabilities, even when developers use the latest models
  • Java code has the highest security failure rate at over 70%
  • Python, C#, and JavaScript fail at 38-45%
  • Even Claude Opus 4.5, the current leader on BaxBench, produces secure code only 56-69% of the time

The Productivity Paradox

Here’s where it gets interesting. Teams using AI assistants see 20% more PRs per author. But incidents per PR are up 23.5%.

A Stanford study found that 15-25% of the productivity gains from AI are eaten up by rework - much of it security-related.

We’re shipping faster, but we’re also shipping more bugs.

The CVEs Nobody Talked About

In August 2025, Microsoft patched CVE-2025-53773 - a remote code execution vulnerability in GitHub Copilot itself. Through prompt injection, attackers could modify your .vscode/settings.json and achieve full system compromise.

Pillar Security discovered the “Rules File Backdoor” attack - hackers can inject hidden malicious instructions into configuration files that Cursor and Copilot read, causing them to insert malicious code that bypasses typical code reviews.

What I’m Seeing in the Wild

The most common issues I find in AI-generated code:

  1. Missing input sanitization - by far the most frequent
  2. Improper password handling - AI often suggests weak patterns
  3. Insecure object references - authorization bypasses
  4. Missing null checks and guardrails - edge cases ignored

In 45% of test cases, LLMs introduce vulnerabilities classified in the OWASP Top 10.

The Question We Need to Answer

Cursor is generating 1 billion lines of committed code per day. If your organization generates 100,000 lines of AI-assisted code this year, roughly 25,000 lines will contain security flaws.

Most security teams already can’t keep pace with manually-written vulnerabilities. What happens when we 3x the vulnerable code output?

I’m not arguing against AI coding tools. I use them. But I’m seeing organizations adopt them without updating their security practices, and it’s creating a growing backlog of vulnerabilities.

How are your teams handling AI code security? What processes have you put in place?

Sam, these numbers are sobering, but I want to push back on the framing a bit.

I use Copilot and Claude Code daily. My experience is that the security issues you’re describing are real, but they’re not evenly distributed. The problems cluster in specific patterns:

Where AI code is actually dangerous:

  • Authentication and authorization flows
  • Input validation at system boundaries
  • Cryptographic operations
  • SQL/NoSQL query construction
  • File system operations with user input

Where it’s mostly fine:

  • UI components and styling
  • Data transformation logic
  • Utility functions
  • Test code (ironically)
  • Configuration and setup

The mistake I see teams making is treating all AI-generated code the same. “We need human review for everything” isn’t practical when you’re generating thousands of lines a day. But “we trust AI for security-critical paths” is obviously insane.

What I actually do:

  1. Security-critical code gets extra scrutiny - I explicitly tell the AI “this touches auth/payments/PII” and ask it to explain the security model
  2. I never accept crypto code without understanding it - AI is terrible at cryptography
  3. I run SAST on every PR - Snyk catches most of the obvious stuff
  4. I keep a mental list of “AI failure patterns” - the OWASP Top 10 violations you mentioned are predictable

The productivity gain is real. The security risk is real. The solution is treating AI like a junior developer who’s great at syntax but doesn’t understand threat models.

Would love to hear what SAST tools others are using specifically for AI-generated code.

This is the conversation I’ve been having with my board for the past six months. They see the productivity numbers and want to accelerate AI adoption. I have to explain why that acceleration needs guardrails.

The Executive Challenge

The stats Sam shared - 48% vulnerability rate, 23.5% increase in incidents - those translate directly to risk metrics that boards understand:

  • Compliance risk: Every vulnerability in production is a potential audit finding
  • Breach cost: The average data breach now costs $4.88M, and AI-generated vulnerabilities are a new attack surface
  • Reputation risk: “AI-generated code caused the breach” is a headline nobody wants

But I also can’t tell my board “we’re not using AI because it’s risky.” That’s not leadership; that’s fear.

How We’re Framing It

I’ve started presenting AI code security as a managed risk, not a binary choice:

  1. Risk tiering - We classify code by sensitivity. AI-assisted code in our core payment processing gets different treatment than AI-assisted code in our marketing site.

  2. Mandatory gates - Security-critical paths require human review AND automated scanning. Non-sensitive code can flow with just automated checks.

  3. Metrics transparency - We track AI-generated code incidents separately so we can see the actual risk, not the theoretical risk.

  4. Insurance review - Yes, we actually asked our cyber insurance provider how they view AI-generated code. Spoiler: they’re watching closely.

The Real Question

Sam’s numbers are accurate, but they’re industry averages. The question for each organization is: what’s YOUR vulnerability rate with YOUR processes?

If you don’t know, you can’t manage the risk. And if you can’t manage the risk, you shouldn’t be accelerating adoption.

We invested in tooling to measure this before we expanded AI usage. That data is what lets me confidently tell the board “yes, we should use AI, AND here’s how we’re controlling the risk.”

Coming from financial services, where regulators are now explicitly asking about AI-generated code in audits, I want to add the compliance dimension.

What Regulators Are Asking

In our last OCC examination, we got questions I’ve never seen before:

  • “What percentage of your production code is AI-generated or AI-assisted?”
  • “How do you identify and track AI-generated code through your SDLC?”
  • “What additional controls exist for AI-generated code in critical systems?”

We didn’t have great answers. Now we do, because we had to build them.

The Financial Services Reality

Sam’s 48% stat becomes existential when you consider:

  • SOX compliance requires demonstrable controls over financial reporting systems
  • PCI-DSS requires secure coding practices with evidence
  • GLBA requires protection of customer financial data

If AI is generating code that touches any of these areas, you need to prove your controls work. “We use SAST” isn’t enough - you need to show the SAST catches the specific vulnerability patterns AI introduces.

Our Current Approach

  1. Explicit tagging - AI-assisted code gets labeled in commit messages. We can query exactly what’s AI-generated.

  2. Elevated review for regulated systems - Any PR touching payment rails, customer data, or financial reporting requires two human reviewers AND security sign-off.

  3. AI-specific test cases - We added test scenarios specifically targeting the OWASP Top 10 patterns Sam mentioned.

  4. Audit trail - We log which AI tool suggested what code, so we can trace back if there’s an incident.

Is this overhead? Yes. Is it necessary? Also yes. The alternative is explaining to regulators why your AI-generated code caused a compliance failure.

The uncomfortable truth: In regulated industries, AI code velocity only matters if you can prove security. Otherwise, you’re just generating audit findings faster.

The identity and fraud implications of AI-generated code vulnerabilities are what keep me up at night.

At my fintech, we’ve seen a pattern: AI-generated auth code often works correctly for the happy path but fails catastrophically on edge cases. And in identity systems, edge cases are where attackers live.

Specific Patterns I’ve Caught

  1. Token validation that trusts expiry without checking signature - AI suggested code that validated JWT expiry but didn’t verify the signature. Worked fine in testing, would have been a complete auth bypass in production.

  2. Rate limiting that forgets session context - AI-generated rate limiting that counted requests globally instead of per-user. An attacker could lock out legitimate users by burning through the rate limit.

  3. MFA bypass through state confusion - AI generated an MFA flow where completing step 1 set a flag that step 2 could read, but the flag wasn’t bound to the session. Replay attack waiting to happen.

These aren’t exotic vulnerabilities. They’re the kind of thing any security engineer would catch in review. But the code looks right. It passes basic tests. It’s exactly the kind of subtle bug that slips through when teams are moving fast.

The Downstream Problem

Sam’s 48% stat is scary, but here’s what’s scarier: AI-generated identity bugs don’t just affect your app. They affect:

  • Every system that trusts your auth tokens
  • Every partner that accepts your identity assertions
  • Every user whose credentials could be compromised

A vulnerability in your payment processing might cost you money. A vulnerability in your identity system might cost you your business.

My Rule

I don’t let AI generate auth code. Period. I’ll use it for everything else, but identity is too critical and the failure modes are too subtle. Humans only for auth flows.