Anthropic just launched Claude Code Security for automated vulnerability scanning. They’re claiming that Claude Opus 4.6 can find high-severity vulnerabilities that went undetected for decades.
I tested it on a client’s fintech codebase. Here are my unfiltered results.
What I Tested
Ran Claude Code Security on a 50K LOC Node.js/TypeScript codebase for a payment processing platform. This code has been through:
- Manual security reviews
- Snyk SAST scanning
- CodeQL analysis
- Two pentest rounds
So it’s been pretty thoroughly vetted.
The Results
Found: 3 real issues we hadn’t caught
- A timing attack vulnerability in token comparison
- A subtle race condition in account creation
- An edge case in refund logic that could be exploited
False Positives: 12 issues flagged that weren’t actually vulnerabilities
- Overly conservative about input validation
- Flagged some intentional design patterns as risks
- Didn’t understand business logic context
Missed: 1 known vulnerability from our pentest
- A business logic flaw in the multi-factor auth flow
- Requires understanding the entire authentication system
What It’s Good At
Claude Code Security excels at:
- Finding logic flaws that pattern-based tools miss
- Explaining vulnerabilities in clear language
- Suggesting remediation with context
- Cross-file analysis and data flow tracking
The timing attack it found was impressive - not an obvious pattern match, but actual reasoning about crypto implementation.
What It Struggles With
Same issues as other AI security tools:
- Incomplete system context leads to overconfidence
- Can’t threat model without understanding attacker motivations
- Misses vulnerabilities that require business logic understanding
- False positive rate higher than traditional SAST
My Take
Claude Code Security is a good first pass, not a replacement for security review.
Use it as:
- First layer in defense-in-depth
- Way to catch obvious and some non-obvious issues
- Educational tool (explanations are excellent)
Don’t use it as:
- Sole security gate
- Replacement for threat modeling
- Substitute for pentest/security review
Questions
Has anyone else tested this? What were your results? Especially interested in comparisons to Snyk, Semgrep, or other AI-enhanced SAST tools.