The Second-Draft Agent Pattern: Why Explore-Then-Commit Beats Self-Critique
When a single-pass agent stops being good enough, the default move is to wrap it in a self-critique loop. Generate, critique, revise, repeat. Most teams I talk to assume the eval lift will be roughly linear with the number of revision rounds and stop there. The numbers rarely cooperate. By the third round of self-critique, accuracy is up two or three points and token cost is up 3–4x, and the failure modes that didn't get caught in round one mostly don't get caught in round three either — because the same context that produced the wrong answer is the one being asked to spot the wrongness.
A different shape works better and costs less: let the first pass be wasteful exploration, throw it away, and run a second pass from a clean context with just the lessons learned. Call it the second-draft pattern, or explore-then-commit. The first draft is permitted to be sloppy, to take dead ends, to dump scratch artifacts, to chase hypotheses that turn out to be wrong. The second draft is constrained — it gets the distilled findings and produces a clean execution. On the kinds of tasks where self-critique is tempting (multi-step reasoning, code that touches several files, research syntheses), this two-pass shape often beats n-of-k self-critique on both quality and cost.
The pattern looks almost trivial once you've seen it, but the operational discipline of running it correctly — what gets carried forward, what gets thrown away, what the second pass is allowed to do — is where most implementations fall apart. The eval delta against single-pass and against self-critique is what tells you whether your particular task earns the second-draft budget. It is not free, and it is not always worth it.
Why Self-Critique Disappoints
Self-critique loops fail in a predictable way. The model that produced the answer is the one being asked to find what's wrong with it. Its context window already contains the framing that led to the answer. Its attention is anchored on the same retrieval results, the same partial plans, the same tool calls. Asking it to critique its own output is like asking someone to proofread a document twenty minutes after they wrote it — they read what they meant to write, not what's on the page.
You see this in evals as a flat-then-collapse curve. Round one of self-critique usually catches the obvious mistakes: a forgotten step, a misnamed variable, a citation that doesn't exist. Round two catches a few stragglers. By round three, the model is mostly inventing problems to justify making changes, and your eval scores drift sideways or down. The cost curve, of course, is monotonic.
There's a second failure mode that's harder to see in aggregate eval scores but obvious in trace-level review. Self-critique compounds bad framings. If pass one went down a wrong solution path, pass two doesn't usually backtrack to a different path — it polishes the wrong path. The output gets more confident, more articulate, more wrong. The model can fix the surface of the answer but rarely re-questions the premise it built up over several thousand tokens of reasoning.
A clean second pass with constrained context avoids both failure modes. It doesn't read its own draft. It reads a short synthesis of what the first pass found and didn't find, and then it does the work from scratch. The reasoning that locked in the wrong premise doesn't survive the handoff.
What Explore-Then-Commit Actually Looks Like
The two passes have different jobs and different prompts. Treating them as the same agent with two turns is the most common mistake — it collapses back into self-critique.
Pass one is exploration. Its job is to learn the shape of the problem. Permitted activities: read more files than will end up in the answer, run more retrievals than will end up cited, try a hypothesis, abandon it, try another. The first pass is graded on what it discovers, not what it produces. Its output is not a draft of the final answer — it's a compact set of findings.
The findings payload is the load-bearing artifact. Keep it short and structured: which approaches were tried, which were ruled out and why, which files or sources turned out to be relevant, what the actual shape of the problem is now that exploration has reduced uncertainty. The findings should fit in a few hundred tokens. If your first pass produces a 4,000-token findings document, the second pass is going to inherit the framing and the pattern collapses back to self-critique.
Pass two is execution. Its prompt is the original task plus the findings payload, and nothing else from the first pass. No reasoning chain, no intermediate outputs, no retrieved-but-discarded snippets. The second pass has a constrained scope and a clean context, which is the whole point. It produces the answer directly, with the findings serving the role that a coworker's stand-up update would serve: not "here is my draft, fix it," but "here is what I learned about the problem, now do the work."
The handoff format matters. The findings should be machine-readable enough to enforce its shape (so the first pass can't smuggle in a full draft) and human-readable enough to inspect when things go wrong. Pre-defined sections — approaches_tried, dead_ends, relevant_artifacts, problem_shape — give the first pass something to fill in without inviting it to write the answer. A schema with empty slots is far better than a free-form summary prompt.
The Eval Delta That Tells You When It Earns Its Budget
A second-draft pattern costs roughly 1.5–2.5x the tokens of a single-pass run, depending on how much exploration the first pass actually does. Whether that's worth it is not a vibes question. Run three variants on the same eval set and look at the curves:
- Single-pass baseline: one shot, no critique.
- N-of-k self-critique: generate, critique, revise, with k in 3.
- Explore-then-commit: pass one produces findings, pass two produces the answer.
The diagnostic you're looking for is where the second-draft pattern's quality-per-token sits relative to the others. On tasks where the answer shape is obvious and the cost of being wrong is low, single-pass wins on cost-adjusted quality and the second draft is overkill. On tasks where exploration genuinely reduces uncertainty — multi-file refactors, debugging where the root cause isn't where the symptom is, research syntheses across several sources — the explore-then-commit pattern shows a quality lift that self-critique can't match, at a similar or lower token cost than 2-of-k or 3-of-k.
The interesting cases are in the middle, where the eval delta is small enough that the cost discipline matters. Watch for these signals:
- High variance across runs on a single-pass baseline (suggests the model is finding different starting framings, which exploration could collapse).
- Self-critique flattens out fast (suggests the same-context problem is dominant).
- Trace-level review of self-critique runs shows the model polishing the surface of wrong answers rather than backtracking (suggests the framing-lock failure mode).
- https://arxiv.org/html/2602.08199
- https://arxiv.org/html/2604.13151v1
- https://dilipa.github.io/papers/iclr26_psrl_llms.pdf
- https://arxiv.org/html/2509.22601v1
- https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents
- https://www.langchain.com/blog/planning-agents
- https://arxiv.org/abs/2509.08646
- https://www.mindstudio.ai/blog/claude-code-ultra-plan-multi-agent-architecture
