Skip to main content

AI Reviewing AI: The Asymmetric Architecture of Code-Review Agents

· 12 min read
Tian Pan
Software Engineer

A review pipeline where the author and the reviewer are both language models trained on overlapping corpora is not a quality gate. It is a confidence amplifier. The author writes code that looks plausible to a transformer, the reviewer reads code through the same plausibility lens, both agents converge on "looks fine," and the diff merges with a green checkmark that means nothing about whether the change is actually correct. Recent industry data shows the asymmetry plainly: PRs co-authored with AI produce roughly 40% more critical issues and 70% more major issues than human-written PRs at the same volume, with logic and correctness bugs accounting for most of the gap. The reviewer agents shipped to catch those bugs are, by construction, the ones least equipped to find them.

The teams getting real signal from AI code review have stopped treating "review" as a slightly different shape of "generation" and started designing review as a fundamentally different cognitive task. Generation prompting asks the model to produce something coherent. Review prompting has to ask the model to find what is missing — to inhabit the negative space of the diff rather than the positive one — and that inversion is much harder to elicit than a one-line system prompt suggests.

Why Review Prompting Is Not Generation Prompting in Reverse

Code generation lives in a generative loop: the model proposes tokens, the next-token distribution rewards plausibility, and the output looks like the training distribution because that is what the loss function asked for. Code review needs the opposite stance. The reviewer has to read finished text and ask "what should be here that isn't" — missing null check, absent error path, unhandled state transition, untested edge case, a precondition the function assumes but never validates. None of those failures show up as something to read. They show up as something to not read, and a model trained to predict the next plausible token has weak machinery for noticing absences.

Two practical consequences follow. First, naive review prompts ("review this diff and find bugs") underperform because they leave the model to decide what review even means, and the default behavior is to summarize and praise rather than to interrogate. Second, the obvious fix — long, detailed review prompts that enumerate every check — overcorrects in the other direction: the review agent starts manufacturing concerns to prove it read the diff, and the false-positive rate climbs to the point where engineers ignore the bot. Research on LLM code editing has documented the same pattern: detailed prompts can introduce a bias toward excessive fault-finding, where the model flags non-existent errors in correct code because the prompt rewarded the appearance of thorough review.

The teams that escape this trade-off treat review as a structured analysis task with explicit anchors. Meta's recent work on semi-formal reasoning templates pushed code-review accuracy substantially higher by forcing the model to state premises, trace execution paths against specific test cases, and derive a conclusion from named evidence rather than vibes. The mechanism is simple: when the prompt requires a chain of "if X then Y because Z," hallucinated concerns are visibly weaker than grounded ones, and the model self-suppresses the kind of generic style nits that make AI review feel like noise.

Multi-Pass Architectures: Specification, Invariants, Adversaries

Single-pass review — one model, one prompt, one response — composes badly with the asymmetry above. The pass-rate ceiling is set by what a generation-trained model can notice in one read of changed lines, which is exactly the set of issues least correlated with what bites in production. The architectures that work at scale decompose review into multiple specialized passes, each with a different question and a different context shape.

A specification check runs first and asks only one thing: does the diff implement what the spec or ticket says it should implement? The reviewer is given the issue description, the user-facing requirement, and the diff — and is forbidden from commenting on style, performance, or polish. The pass succeeds or fails on intent fidelity. SpecRover-style approaches that treat tests and specs as executable contracts catch a category of bug — the patch passed CI but didn't actually fix the customer's problem — that no per-line review will ever surface.

An invariant check runs second and asks whether the change preserves the system's existing properties. The context is not just the diff but the surrounding module, the called functions, and the implicit contracts (this function returns non-null, this list is sorted, this state machine never goes from archived to active). Most subtle production bugs are invariant violations the original author considered obvious; the review agent has to be told what they are.

An adversarial probe runs last and asks the most useful question of all: how would an attacker, a malicious input, or a chaos test break this code? Adversarial code-review patterns separate a Builder Agent (optimized for speed and synthesis) from a Critic Agent (optimized for hostile reasoning) and require the Critic to attempt to falsify the Builder's claim that the change is correct. The pattern works because the Critic's success metric is rejection, not approval — its prompt rewards finding a counterexample, not signing off.

When multiple critics run in parallel, a Moderator role becomes structural rather than optional. Three independent passes will produce overlapping concerns, and dumping all of them into the PR thread is exactly the failure mode that trains engineers to ignore the bot. Game-theoretic multi-agent designs handle this by separating analysis (read-only critics) from synthesis (a single moderator that deduplicates, prioritizes by severity, and writes the final review). The moderator is also the natural place to enforce a precision floor: if a finding cannot be linked to a specific line, a specific consequence, and a specific test that would catch the regression, it does not ship.

The Reviewer-Is-Not-the-Author Constraint

The most underrated architectural decision in AI code review is which model reviews which code. When the reviewer agent and the author agent share a base model, they inherit the same blind spots — the same training data, the same alignment quirks, the same things they were never penalized for missing during pretraining. A Claude-authored function reviewed by Claude is a single point of failure dressed up as two.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates