Skip to main content

33 posts tagged with "code-review"

View all tags

The Unmergeable Agentic Refactor: Why Multi-File Diffs Break at the Seam

· 9 min read
Tian Pan
Software Engineer

A 40-file refactor from a coding agent lands on your desk. You open the PR, scroll through the diff, and every hunk looks fine. The rename is consistent, the imports are tidy, the tests compile in isolation. You merge. Forty minutes later, CI on main goes red because two call sites in a sibling package still pass three arguments to a function that now takes four, and the type checker that would have caught it was never part of the agent's inner loop.

This is the most common failure mode in agent-authored refactors today, and it has almost nothing to do with the quality of the individual edits. Each file, reviewed on its own, looks like something a careful human would have written. The bug lives at the seams — the boundaries where edits from different files have to agree. File-level review hides seam-level correctness, and most review workflows were designed around files.

AI Code Review in Practice: What Automated PR Analysis Actually Catches and Consistently Misses

· 9 min read
Tian Pan
Software Engineer

Forty-seven percent of professional developers now use AI code review tools—up from 22% two years ago. Yet in the same period, AI-coauthored PRs have accumulated 1.7 times more post-merge bugs than human-written code, and change failure rates across the industry have climbed 30%. Something is wrong with how teams are deploying these tools, and the problem isn't the tools themselves.

The core issue is that engineers adopted AI review without understanding its capability profile. These systems operate at a 50–60% effectiveness ceiling on realistic codebases, excel at a narrow class of surface-level problems, and fail silently on exactly the errors that cause production incidents. Teams that treat AI review as a general-purpose quality gate get false confidence instead of actual coverage.

AI as a CI/CD Gate: What Agents Can and Cannot Reliably Block

· 9 min read
Tian Pan
Software Engineer

An AI reviewer blocks a merge. A developer stares at the failing check, clicks "view details," skims three paragraphs of boilerplate, and files a "force-push exception" without reading the actual finding. Within a week, every engineer on the team has internalized that the AI gate is background noise — something to dismiss, not engage with.

This is the outcome most teams building AI CI/CD gates actually ship, even when the underlying model is technically capable. The problem is not whether AI can review code. The problem is what you ask it to block, and what you expect to happen when it does.

The Debugging Regression: How AI-Generated Code Shifts the Incident-Response Cost Curve

· 9 min read
Tian Pan
Software Engineer

In March 2026, a single AI-assisted code change cost one major retailer 6.3 million lost orders and a 99% drop in North American order volume — a six-hour production outage traced to a change deployed without proper review. It wasn't a novel attack. There was no exotic failure mode. The system just did what the AI told it to do, and no one on-call had the mental model to understand why that was wrong until millions of customers had already seen errors.

This is the debugging regression. The productivity gains from AI-generated code are front-loaded and visible on dashboards. The costs are back-loaded and invisible until your alerting wakes you up at 3am.

AI Code Review at Scale: When Your Bot Creates More Work Than It Saves

· 10 min read
Tian Pan
Software Engineer

Most teams that adopt an AI code reviewer go through the same arc: initial excitement, a burst of flagged issues that feel useful, then a slow drift toward ignoring the bot entirely. Within a few months, engineers have developed a muscle memory for dismissing AI comments without reading them. The tool still runs. The comments still appear. Nobody acts on them anymore.

This is not a tooling problem. It is a measurement problem. Teams deploy AI code review without ever defining what "net positive" looks like — and without that baseline, alert fatigue wins.

When Everyone Has an AI Coding Agent: The Team Dynamics Nobody Warned You About

· 10 min read
Tian Pan
Software Engineer

A team of twelve engineers adopts AI coding tools enthusiastically. Six months later, each engineer is merging nearly twice as many pull requests. The engineering manager celebrates. Then the on-call rotation starts paging. Debugging sessions last twice as long. Nobody can explain why a particular module was structured the way it was. The engineer who wrote it replies honestly: "I don't know — the AI generated most of it and it seemed fine."

This scenario is playing out at companies everywhere. The individual productivity story is real: developers finish tasks faster, write more tests, and clear backlogs more efficiently. The team-level story is more complicated, and most organizations aren't ready for it.

Prompt Diff Review as a Discipline: What Reviewers Actually Need to Ask

· 11 min read
Tian Pan
Software Engineer

A one-line change to a system prompt landed in production last quarter at a mid-sized AI startup. The diff looked harmless: an engineer tightened the instructions around response length. The reviewer approved it in two minutes, as they would a variable rename. Within 48 hours, support tickets spiked. The model had started truncating answers mid-sentence on complex queries, and the edge cases the old phrasing had been silently handling for months were now failing. The original instruction hadn't just controlled length — it had implicitly anchored the model's judgment about when a topic was complete. Nobody had captured that. Nobody had looked for it.

This is the core problem with prompt review today: we're applying code review instincts to a medium where those instincts are mostly wrong. Code review works because the artifact being reviewed is deterministic and the semantics are recoverable from syntax. A prompt is neither. Its meaning is distributed across the model's weights, its training data, and the stochastic sampling that runs at inference time. The diff you see on screen is a fraction of the change you're approving.

The AI Code Review Trap: Why Faster Reviews Are Making Your Codebase Worse

· 10 min read
Tian Pan
Software Engineer

Your team ships more code than ever. PR velocity is up, cycle time is down, and the backlog is shrinking. On every dashboard that a manager looks at, things look great. Meanwhile, your incident count per PR is quietly climbing 23.5% year over year.

This is the AI code review paradox. AI tools make engineers faster at writing code and faster at approving it — but the defects that matter most are slipping through at a higher rate than before. The two sides of this paradox compound each other, and most teams are not measuring the right things to notice it.

Your Code Review Process Is Optimized for the Wrong Failure Mode

· 8 min read
Tian Pan
Software Engineer

Your code review checklist was designed for a world where the primary defect was a misplaced semicolon or a forgotten null check. That world is gone. AI-generated code rarely has typos. It almost always compiles. And it is quietly degrading your codebase in ways your review process was never built to catch.

Analysis of hundreds of thousands of GitHub pull requests reveals that AI-generated code creates 1.7x more issues than human-written code — roughly 10.8 issues per PR versus 6.5. But the defect distribution has shifted fundamentally. Logic errors are up 75%. Performance issues appear nearly 8x more often. Security vulnerabilities are 1.5–2x more frequent. The bugs that matter most are exactly the ones your traditional review gates miss.