Skip to main content

The PR Description Your Coding Agent Cannot Write

· 10 min read
Tian Pan
Software Engineer

Your coding agent finished the task. The diff is small, the tests are green, the lint is clean, and the PR body says, in its entirety, "Fixes the bug in module X." A reviewer six time zones away opens the page, reads the diff in isolation, sees nothing wrong with it, and approves a technically correct change that solves the wrong problem. The change ships. Two days later a customer asks why the workaround they had been relying on stopped working, and you discover that the bug your agent fixed was not the bug the ticket was about.

The code was fine. The reviewer was conscientious. The agent did exactly what it was asked. The artifact between them — the pull request — was empty of everything that would have caught the mistake.

This is the rationale gap, and it is the part of agent-assisted development that the throughput numbers do not warn you about. Faros AI's 2026 telemetry across 22,000 developers found that teams with heavy AI adoption merge 98% more pull requests, but the PRs are 154% larger and take 91% longer to review. An empirical study of 567 agent-generated PRs found that 83.77% are eventually accepted versus 91.01% for human-authored ones — but 45.1% required human revision for correctness, documentation, or code style. AI-authored PRs wait, by one industry measurement, roughly 4.6 times longer than human ones for a reviewer to pick them up. The bottleneck has moved from writing code to reading inferences that the agent left implicit.

What a PR body is actually for

A pull request is two artifacts glued together. The diff answers what does this code do. The description answers why does this code exist. These are different questions with different sources. The first can be mechanized from the patch; an LLM can read a diff and produce a competent summary of it. The second cannot, because the answer lives in places the diff does not: the ticket that motivated the work, the user complaint that triggered the ticket, the Slack thread where you and a teammate decided the obvious fix was the wrong one, the eval that started failing last Thursday, the rejected earlier attempt the agent threw away after the first review pass.

Human authors weave this into a PR body without thinking about it, because they were the person the context happened to. They write "this is the third time we've seen this in incidents, last fix was XYZ, this approach is different because…" and the reviewer in another time zone has everything they need to evaluate the change asynchronously. They never see the conversation, but the conversation is in the artifact.

An agent had all of this context too — usually more of it. It had the prompt that started the run, the ticket text it was given, the failing eval it was told to reproduce, the rejected earlier diff from an aborted attempt, the tool calls it made to read the codebase, and a written record of every decision it considered. Then, at the moment of opening the PR, it threw all of that away and wrote "Fixes the bug in module X." The context was there at runtime. It was discarded at handoff time.

Why agents default to thin PR bodies

This is not a model-capability problem. The same model that wrote the patch could write a five-paragraph rationale section if asked. The reason most agent PRs are thin is structural, and it is worth being precise about why.

The agent's harness does not score the PR body. It scores whether tests pass, whether lints pass, whether the diff applies cleanly, and whether the task instruction was satisfied. The PR body is the last step before submission, often produced from a generic "summarize the changes" subprompt. There is no eval that flunks an agent for shipping a description that reads like the changelog of a different PR. So you get the cheapest description the agent can produce: a literal summary of the diff, with everything load-bearing about why stripped out.

The harness also tends to lose the context that would be most useful in the PR body. The prompt that started the work is usually consumed by the first reasoning turn and then summarized out of the agent's context window as the run progresses. The agent's intermediate "I tried X and backed out because Y" reasoning is held in scratchpads or sub-agents that do not survive to the final tool call. By the time the agent reaches create_pull_request, the loudest things in its context are recent tool outputs and the final diff — not the original ticket and not the path it took to get there. The PR body it produces reflects that: it describes the destination, not the journey, because the journey has already fallen out of the window.

The third reason is review feedback that does not loop back into the harness. When a human reviewer asks "why did you not do X instead?" in a comment, the agent gets that as a fresh message and answers it inline, in the PR thread. The next agent that opens a similar PR has no signal that the previous one needed to pre-empt that question in the body. The deficiency is corrected in conversation each time and never lifted into the harness as a pattern.

What async review actually needs

Asynchronous code review is a low-bandwidth channel. The reviewer cannot tap you on the shoulder. They have the diff, the description, the linked ticket if there is one, and CI output. Everything they need to decide whether to approve has to be in that bundle, because the cost of asking a question is a round-trip that may take a day.

Human authors learn this and over-explain on purpose. They include the alternatives they rejected so the reviewer does not raise them. They list the assumptions a reviewer should challenge so the conversation goes where it is useful. They link the eval failure or the customer ticket so the reviewer can verify they are solving the right problem. None of this is in the diff. All of it is what makes async review tractable.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates