Skip to main content

One post tagged with "ai-code-review"

View all tags

AI Code Review Drift: When Your LLM Reviewer's Standards Mutate Faster Than the Code

· 9 min read
Tian Pan
Software Engineer

The PR-review dashboard has shown green for six weeks. Bot catch rate, comment volume, developer "thumbs up" reactions — all steady. Then a security incident lands in production and the post-mortem points at a missing null-check the bot used to catch and quietly stopped catching about two months ago. Nobody changed the bot. Nobody downgraded the model. The dashboard never moved. The standard moved.

This is the failure mode of automated code review that doesn't show up in any product demo. Teams adopt an LLM reviewer for the consistency win — every PR gets the same checklist, no senior engineer's bad-day variance, fast turnaround for junior contributors — and the consistency is real for about a quarter. Then the system prompt evolves, the model bumps, the few-shot library accumulates, and the bot is reviewing a different codebase against a different rubric using a different model than the one the team validated against. The team's mental model of "what the bot catches" decays into "what the bot caught last week."