There are now 42+ startups in the AI code review space, all funded by VCs who see the massive addressable market — because every software team reviews code, and code review is time-consuming. CodeRabbit, Codeium, Greptile, Codacy, DeepSource, Sourcery, and dozens more are all competing for the same engineering teams with remarkably similar pitches: “AI reviews your PRs automatically, catches bugs, suggests improvements, saves reviewer time.” The landing pages blur together. The demo videos are interchangeable. The promise is the same.
After trying 5 different AI code review tools over the past year across two different teams and three different codebases, I’ve noticed a pattern that concerns me: they all catch the same things and miss the same things.
What AI Code Review Is Good At
Let me give credit where it’s due. These tools do provide value in specific, well-defined areas:
1. Style enforcement. AI is genuinely good at catching inconsistent naming conventions, formatting issues, and import ordering that falls outside your configured linter rules. Where ESLint enforces rules you’ve explicitly configured, AI can enforce project-specific conventions that are implicit — like “we always use early returns in this codebase” or “we prefer const arrow functions over function declarations.” This is legitimately useful and hard to replicate with traditional tooling.
2. Documentation suggestions. AI is surprisingly good at identifying functions that need documentation and generating reasonable docstrings. It can look at a function’s parameters, return type, and implementation to produce a first-draft docstring that captures the essential behavior. It won’t write great documentation, but it’ll flag the gaps and give you a starting point.
3. Obvious bug patterns. Null pointer dereferences, unused variables, unreachable code, obvious race conditions in concurrent code, off-by-one errors in loop boundaries. These are real catches, but they overlap heavily with static analysis tools (TypeScript’s strict mode, ESLint with appropriate plugins, Semgrep) that are cheaper, faster, and more reliable. AI catches maybe 10-15% more issues in this category than a well-configured static analysis pipeline.
4. Boilerplate improvements. Suggesting more idiomatic patterns, simplifying complex conditionals, recommending standard library functions over custom implementations. “You could use Array.findIndex() instead of this manual loop” is a common and helpful suggestion.
What AI Code Review Consistently Misses
This is where it gets concerning:
1. Architectural decisions. Whether this code should exist in this module. Whether this pattern is consistent with the rest of the codebase’s architecture. Whether this approach will cause problems at scale. AI reviews code in isolation — it doesn’t understand that putting this database query in a React component violates your team’s layered architecture, or that this new utility function duplicates logic that already exists in a different package. Architecture requires holistic understanding that current AI code review tools simply don’t have.
2. Business logic errors. The code “works” — it compiles, passes type checks, and does what it says in the function name. But it implements the wrong business rule. AI doesn’t know that a 30-day return window should actually be 14 days for electronics, or that this discount calculation should exclude items already on clearance. Business logic correctness requires domain knowledge that these tools don’t possess.
3. Security vulnerabilities in context. AI can spot eval(userInput) from a mile away. But it misses that a specific API endpoint bypasses authentication middleware because of how the Express route is registered — the route was added after the auth middleware in the middleware chain, so it’s unprotected, but there’s nothing wrong with the code in isolation. Context-dependent security vulnerabilities require understanding the full application flow across multiple files, configurations, and middleware chains.
4. Performance issues at scale. Code that works perfectly fine for 100 users but will crater at 100,000 users. An N+1 query that’s invisible when you have 10 records but devastating with 10 million. A cache invalidation strategy that works in a single-server deployment but causes thundering herd problems in a distributed system. AI doesn’t understand your traffic patterns, data volumes, or deployment topology.
The Real Danger: False Negatives with High Confidence
Here’s what keeps me up at night. When an AI code review tool comments “Approved: no issues found” or shows a green checkmark, teams relax their human review. The AI’s approval creates a false sense of security that subtly shifts human behavior. Reviewers spend less time on PRs that the AI has already “approved.” They skim instead of reading carefully. They assume the AI caught the obvious stuff, so they only need to think about the non-obvious stuff — but they don’t always know what the AI missed.
I’ve personally seen a PR where the AI approved code that had a critical authorization vulnerability. The vulnerability was context-dependent — the function was called with user-controlled input from a different file, through two layers of indirection. The AI reviewed the function in isolation, saw nothing wrong, and gave it a thumbs up. A human reviewer, trusting the AI’s approval, gave it a quick glance and approved it too. It made it to production.
The Bubble Prediction
Of the 42+ startups in this space, I predict 3-5 will survive the next 3 years. The differentiation isn’t in the AI model — they all use similar foundation models (GPT-4, Claude, or fine-tuned open-source models). The differentiation is in integration depth: how well the tool understands your specific codebase, your specific patterns, your specific architecture decisions, and your specific risk profile.
The winners will be the ones that invest in deep codebase understanding — building persistent knowledge graphs of your architecture, learning your team’s conventions from historical PRs, and reasoning about cross-file data flows. The losers will be the ones competing on better prompt engineering and flashier UI, because those advantages are trivially replicable.
So I’ll ask the community: are you using AI code review tools in your workflow? And critically, have they caught issues that your human reviewers actually missed — or do they mostly duplicate what your existing tools already catch?