Skip to main content

The PR-Bot That Never Sleeps: When Your Reviewers Become the Rate Limiter

· 11 min read
Tian Pan
Software Engineer

For two decades the bottleneck in software engineering was writing code. We optimized IDEs, autocompletion, refactoring tools, and frameworks to make typing cheaper. We won. Now the bottleneck moved one step downstream: writing is cheap, and reading is expensive. The PR-bot can spin up ten implementation attempts in parallel and open ten pull requests against your repo before you finish your morning coffee. Your reviewers cannot.

The rate limiter for AI-assisted software delivery is no longer the model's tokens per second. It is the number of human eyes you can put on a diff per day. And when those eyes get overwhelmed, you do not get a graceful degradation — you get rubber stamps. Code merges with LGTM 🚀 on top of code that nobody actually read. A senior engineer approves an AI-written patch that another AI tool already reviewed, and three weeks later a data-inconsistency bug eats forty hours of someone's life. Surface correctness is not systemic correctness, and a green pipeline is not understanding.

This is the unglamorous part of agentic engineering: the agent does its job, the throughput goes up, and the integrity of the codebase quietly degrades because the human side of the loop never got a capacity plan.

The Bottleneck Migrated, It Did Not Disappear

The numbers from the past year tell a consistent story. GitHub's Copilot code review has processed over sixty million reviews, growing roughly ten-fold in under a year, and more than one in five code reviews on the platform now involves an agent on at least one side of the diff. PR volume on AI-assisted repositories is up around 29% year over year. Ramp has publicly described an internal Inspect agent handling roughly 30% of their PRs across frontend and backend. Researchers consistently report that AI generates code five to seven times faster than humans can comprehend it.

These numbers do not describe an industry where reviewers got faster. They describe an industry where the work coming at reviewers got bigger and more frequent, while the cognitive throughput of the humans on the other side stayed exactly where it was. Capacity planning that assumes a steady review velocity is planning for a world that does not exist anymore.

The honest framing is that you have a queue. The arrival rate is now set by a fleet of agents that do not sleep, do not get distracted, and do not feel social pressure to slow down. The service rate is set by a fixed number of humans whose attention is finite and whose context-switching cost is real. Queue theory tells you what happens next: utilization climbs, then latency explodes, then the system finds an unhealthy equilibrium by silently dropping quality. In code review, that "silently dropping quality" looks like LGTM on diffs that were skimmed, not read.

Why Reading at the Same Speed Just Lowers Quality

The natural reflex is to push reviewers to be faster. This is the wrong answer, and it is a particularly dangerous wrong answer because it works in the short term. Reviewers can speed up by skimming. They can speed up by trusting that the linter and the tests caught the important things. They can speed up by approving the diff and moving on with their day. Throughput improves; the dashboards look great; the codebase begins accumulating a kind of debt that nobody has named on the engineering side.

Addy Osmani has been calling it comprehension debt — the gap between code that exists in your repository and code that any human currently in the org actually understands. AI-generated code is unusually good at hiding this gap because it tends to be syntactically correct, well-formatted, and superficially idiomatic. All the historical signals that triggered merge confidence are intact. The signal that has decayed is the only one that matters: did a competent reviewer build a mental model of what this change does to the system?

The other failure mode is more insidious. When the rubber-stamp habit takes hold, it removes the one thing code review was actually good for besides catching bugs — it was where junior engineers learned the codebase by watching senior engineers think out loud about it. If the senior engineer's contribution to the review is an emoji, the junior engineer learns nothing, and five years from now you will need someone to make a judgment call no model can make, and you will discover you never trained that person because an agent was doing the part of the job where the training used to happen.

A reviewer at human reading speed cannot keep up with a bot at agent generation speed, and trying to make them keep up does not buy you safety. It buys you the appearance of safety on top of an unread codebase.

Risk-Tiered Auto-Merge: A Hot Path and a Cold Path

The first structural fix is to stop pretending all PRs deserve the same level of human attention. A markdown typo fix is not the same kind of risk as a change to authentication middleware, and treating them as equally deserving of a human reviewer is how you ensure that the authentication change gets the same fifteen-second skim as the typo.

A reasonable risk score draws from things you already have: cyclomatic complexity, files touched, total diff size, presence of changes to security-critical paths, modifications to migrations or schema, and whether tests were added or just adjusted. You bin PRs into tiers: documentation and trivial fixes go through with automated checks only, single-file isolated changes with good test coverage go through with a lightweight review or an AI-only review, and anything touching shared abstractions or sensitive surfaces requires a real human read with sign-off.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates