Last month, during a routine security audit at our fintech, I discovered a critical vulnerability in our authentication flow. The code looked clean, the tests passed, and the PR had been approved by two senior engineers. The problem? The entire auth middleware—including the session validation logic—had been generated by an AI coding assistant. And buried in there was a timing attack vulnerability that could leak user credentials.
That incident sent me down a research rabbit hole, and what I found is deeply concerning for our industry.
The Data Doesn’t Lie
According to recent research from Veracode and Georgetown’s CSET, 48% of AI-generated code contains potential security vulnerabilities. Some studies report even higher rates—up to 62% depending on methodology. When researchers tested 30 AI-generated pull requests in real-world conditions, 87% contained at least one vulnerability.
Even more alarming: AI-generated code introduces 15-18% more security vulnerabilities than human-written code, per Opsera’s 2026 AI Coding Impact Benchmark Report.
The most common vulnerabilities mirror OWASP’s greatest hits:
- SQL Injection (found in 31% of projects)
- Cross-Site Scripting/XSS (27% of projects)
- Broken Authentication (24% of projects)
- Prompt injection vulnerabilities
- Command injection risks
The Trust Paradox
Here’s what keeps me up at night: we’re building a circular trust problem.
Teams adopt AI coding assistants to move faster. Fair enough. But when AI writes the code and AI assists in code review, who’s actually verifying correctness? We’ve essentially created a system where the fox is guarding the henhouse, and we hired another fox to watch the first fox.
The data backs this up. Pull requests with AI-generated code average 10.83 issues per PR, compared to 6.45 for human-written code. And these aren’t just style nitpicks—we’re talking security vulnerabilities, logic errors, and edge cases the AI simply didn’t consider.
Why Review Is Harder Now
Traditional code review assumes you can ask the author “why did you do this?” With AI-generated code, that conversation doesn’t exist. You get dense, multi-line suggestions with no rationale, no context about trade-offs considered, no explanation of why one approach was chosen over another.
Reviewers are left parsing logic that nobody on the team actually wrote. We’re debugging black-box output, trying to infer intent from patterns in training data we’ve never seen.
And security review? That requires understanding threat models, attack surfaces, and adversarial thinking. AI assistants are trained on existing code—including code with known vulnerabilities. If string-concatenated SQL queries appear frequently in training data, the AI will cheerfully suggest them to you.
The Growing Problem
Security debt now affects 82% of companies, up from 74% just a year ago. High-risk vulnerabilities have increased from 8.3% to 11.3%. Part of this is the explosion of AI-assisted development—we’re shipping more code, faster, with less human scrutiny of each line.
Meanwhile, 66% of developers cite inaccurate AI code suggestions as their top challenge, and 45% report longer debugging times despite the promised productivity gains.
So What Do We Do?
I’m wrestling with this at my company, and I don’t have all the answers. Here’s what I want to ask this community:
-
How are your teams handling code review for AI-generated code? Are you treating it differently than human code? Special review processes?
-
Are you running security-specific tooling in CI/CD? SAST/DAST for every commit? Static analysis with security rulesets?
-
Who owns the security of AI-generated code? Is it the developer who accepted the suggestion? The reviewer who approved it? The security team? The AI vendor?
-
What about training? Are we updating security training to include AI-specific risks? Teaching developers to spot AI-generated vulnerability patterns?
-
Should certain codebases restrict AI tools entirely? For highly sensitive systems—auth, payments, PII handling—should we just say “no AI assistance allowed”?
At Auth0 and Okta, we had the luxury of paranoia. Every line of identity code was reviewed by security specialists. But most companies don’t have that luxury. Most teams are small, moving fast, and now they’ve got AI assistants pumping out code faster than humans can review it.
The question isn’t whether AI coding assistants are useful—they are. The question is: who reviews the reviewers when AI writes the code?
I’d love to hear how other teams are solving this. What’s working? What’s failed spectacularly? What practices are you implementing to maintain security velocity without sacrificing security rigor?
Because right now, we’re in a weird transitional period where AI writes the code, humans rubber-stamp it, and security teams discover the problems in production. That can’t be the endgame.
Sources: Veracode AI Code Security, Georgetown CSET Report, Practical DevSecOps Research 2026, Dark Reading Analysis