The Security Implications of Rubber-Stamping AI-Generated PRs

I need to talk about something that the productivity-focused AI coding discussions consistently underweight: the security implications of review fatigue on AI-generated code. Because from a security perspective, we are building a catastrophe.

Here is the threat model in plain terms: AI coding tools dramatically increase code volume. Review capacity stays flat. Reviewers get fatigued and start rubber-stamping. Attackers know this, and they will exploit it.

This is not theoretical. It is already happening in open source, and it is a matter of time before it becomes a widespread problem in internal codebases too.

The Attack Surface Nobody Is Modeling

Let me describe the attack surface that AI-accelerated review fatigue creates:

1. Volume Camouflage

When a team produces 15 PRs per week, a malicious PR stands out. When a team produces 50 PRs per week, the signal-to-noise ratio drops dramatically. A reviewer who is fatigued from reviewing 10 PRs that morning is more likely to miss a subtle vulnerability in PR number 11. Attackers have always relied on overwhelming defenders – AI code volume does this automatically.

2. AI Pattern Exploitation

AI-generated code creates recognizable patterns. Reviewers learn to expect these patterns and develop “AI code blindness” – they start skimming AI code because it “usually looks right.” This is exactly the opening for an attacker (or a compromised AI model) to introduce malicious code that hides behind the expected AI style.

A recent study found that 45% of AI-generated code contains security vulnerabilities. Not all of these are exploitable, but the volume means even a small percentage of exploitable vulns translates to a significant attack surface.

3. Supply Chain Contamination

This connects to the slopsquatting problem. AI tools sometimes hallucinate package names – 20% of AI-generated code references non-existent packages, and 43% of hallucinated names recur consistently. Attackers register these hallucinated package names and upload malicious code. A developer using AI to generate code pulls in a dependency that the AI invented, and the attacker was waiting for it.

In a world where reviewers are overwhelmed, the chance of someone catching “wait, this package did not exist last week” drops significantly.

4. The Insider Threat Multiplier

Consider a scenario where a malicious insider uses AI to generate a large volume of legitimate-looking PRs to build trust and create review fatigue among their colleagues. After weeks of AI-generated code that passes review easily, they slip in a PR with a subtle backdoor. The reviewers, conditioned to approve their AI-generated output quickly, miss it.

This is not far-fetched. It is a straightforward adaptation of existing insider threat tactics to the new AI-code-generation reality.

What the Data Says About Security Review Quality

GitHub is literally building a “kill switch” for pull requests because AI-generated contributions are overwhelming open-source maintainers. In the enterprise space, SonarSource’s 2026 developer survey found that 95% of developers spend effort reviewing and correcting AI output, but 59% rate that effort as only “moderate” – which in security terms means they are probably missing things.

The Cortex 2026 benchmark shows incidents per PR up 23.5% alongside higher AI adoption. Some portion of those incidents are almost certainly security-related. And change failure rates are up 30%, which means more emergency patches, more rushed fixes, more opportunities for the fix-for-the-fix to introduce new vulnerabilities.

My Actual Recommendations

I spend a lot of time doing threat modeling for organizations, and here is what I am recommending to every client right now:

1. Separate security review from code review.

Do not expect the same developer who reviews business logic to also catch security issues in AI-generated code. Have dedicated security review for any PR that touches authentication, authorization, data handling, or external interfaces – regardless of whether it is AI-generated or not.

2. Mandatory SAST/DAST before human review.

No AI-generated PR should reach a human reviewer without first passing a comprehensive static application security testing (SAST) scan. This is table stakes. If your pipeline does not do this, fix it today.

3. Dependency lockfiles and package verification.

Every new dependency introduced by AI should be verified against known registries. Automated checks should flag any dependency that did not exist 30 days ago or has fewer than a threshold number of downloads.

4. AI code provenance tracking.

You need to know which code in your codebase was AI-generated. Not for blame, but for risk assessment. AI-generated code has a different risk profile than human code, and your security posture should reflect that.

5. Slow down on sensitive paths.

For code that handles money, personal data, authentication, or external API interactions: reject the pressure to review faster. These paths deserve the same review depth they always did, regardless of how the code was generated. If that means those PRs take longer, that is the correct trade-off.

The Uncomfortable Parallel

I keep thinking about the early days of DevOps, when the mantra was “ship faster, break things, fix forward.” Security teams warned that faster deployment without proportional security investment would create problems. Those warnings were largely ignored until we got SolarWinds, Log4Shell, and the continuous parade of supply chain attacks that followed.

We are at the same inflection point with AI-generated code. The industry is focused on productivity acceleration and treating review capacity as an inconvenience to be optimized away. Security teams are warning that degraded review quality creates exploitable gaps. The question is whether we listen this time, or wait for the inevitable major incident.

What are you seeing in your organizations? Are security reviews getting the same attention they did before AI coding tools? Or is the pressure to ship faster winning?

Sam, your threat model is rigorous and I want to add some data to it.

We have been tracking security-related findings in our code review data, and the pattern supports your concern. In Q4 2025, before AI adoption was widespread on our team, security-relevant review comments (authentication issues, injection risks, data exposure, etc.) made up about 8% of all review comments. In Q1 2026, with 75% AI tool adoption, that dropped to 4.5%.

The drop is not because AI code has fewer security issues. It is because reviewers are spending their limited bandwidth on functional correctness and have less capacity for security analysis. Security review is the first casualty of review fatigue because it requires the deepest analysis and is the easiest to rationalize skipping (“the automated scanner will catch it”).

Your recommendation about separating security review from code review is exactly right, and I would go further: security review should be asynchronous and happen after the code is merged but before it hits production. This sounds counterintuitive, but here is the reasoning:

  1. Security review as a merge blocker creates pressure to approve quickly, which degrades quality
  2. A post-merge, pre-production security review removes the time pressure
  3. If the security reviewer finds an issue, the fix is “revert and fix” rather than “hold up the entire team”

This only works if you have a staging environment and a deployment pipeline that supports it. But it decouples security review thoroughness from deployment velocity, which is the tension causing quality degradation.

Your point about supply chain contamination through hallucinated packages is the most underreported security risk of AI coding. We caught two instances of this in our codebase last month – dependencies that AI suggested that did not exist in npm when the code was written, but had been registered by the time we deployed. Fortunately, our dependency lockfile caught it. Teams without lockfile discipline are exposed.

Sam, I want to respectfully push back on part of your argument. The threat model is sound, but I think the severity framing is disproportionate to the actual risk landscape for most organizations.

Yes, 45% of AI-generated code contains “security vulnerabilities.” But that stat includes everything from minor style issues that could theoretically be exploited to actual critical vulnerabilities. When you filter for high and critical severity, the rate drops to about 8-12%, which is comparable to human-written code in codebases without strong security culture.

The insider threat scenario you describe – an attacker building trust with AI PRs before slipping in a backdoor – is theoretically possible but adds AI as an unnecessary layer of complexity. An insider can already build trust with normal code and slip in a backdoor. The attack surface is the insider access, not the AI tooling.

Where I do strongly agree with you is on the dependency verification point. Slopsquatting is a genuinely new attack vector that AI created, and it is easy to defend against but almost nobody is doing it. Checking new dependencies against known registries with minimum age and download thresholds should be a standard CI check. That it is not speaks to how behind most organizations are on basic supply chain security, regardless of AI.

I also think your SAST recommendation is correct but incomplete. SAST tools are notoriously noisy, and in the context of review fatigue, adding SAST noise on top of the review burden can actually make things worse. Teams start ignoring SAST findings the same way they start rubber-stamping reviews. The SAST tooling needs to be tuned aggressively for signal-to-noise ratio, not just deployed and forgotten.

The real question I think the industry needs to answer is: what level of residual risk are we willing to accept? Because zero-risk review at AI-scale volume is not achievable. Every proposal in this conversation is a trade-off between speed, quality, and risk. Security professionals need to be part of that trade-off discussion, not just saying “slow down” from the sideline.

Sam, the DevOps parallel at the end of your post really landed for me.

I lived through the “ship fast, fix later” era as a design systems lead, and the security debt from that period took years to clean up. We are making the same mistake with AI-generated code, but at 10x the scale.

What strikes me about this whole conversation across multiple threads is that we are all debating HOW to review AI code, but nobody is asking whether the review process itself is the right mechanism for catching security issues at AI scale.

Code review was designed in an era when:

  • Humans wrote all the code
  • Code volume was manageable for human review
  • The reviewer knew the author and could assess their reliability
  • Security issues were relatively rare in well-written code

None of these assumptions hold for AI-generated code. So why are we trying to fix the review process instead of building a different verification system?

In design, when a tool or process cannot meet its requirements, we do not just make it faster or add more people. We redesign from first principles. What would a code verification system look like if we designed it today, from scratch, for a world where 41% of code is AI-generated?

I think it would look something like:

  • Automated property-based testing that generates edge cases faster than AI generates code
  • Continuous security scanning that runs in the background, not as a gate
  • Behavioral monitoring in production that catches anomalies in real-time
  • Human review focused exclusively on architectural decisions and product requirements
  • No expectation that any human reads every line of AI-generated code

This is not lowering standards. It is redesigning the verification system for the actual reality of 2026 software development. The code review process as we know it was a brilliant solution to a problem that no longer exists in its original form.

Sam, your recommendations are practical and urgent for today. But I think the 12-month horizon requires something more radical than patching the existing review process.