Skip to main content

The Verifier Loop That Couldn't Converge

· 11 min read
Tian Pan
Software Engineer

The most expensive bug in an agent system is the one with no error message. Worker proposes a draft. Verifier rejects it with a paragraph of feedback. Worker revises. Verifier rejects again. The loop keeps spinning, the trace keeps growing, the bill keeps climbing, and from the outside the system looks like it is working — diligently, in fact, because both models are doing their assigned job. What nobody priced in is that the verifier's acceptance criteria are not fixed across calls. The target the worker is chasing is moving, and the loop has no convergence guarantee.

You shipped "iterate until satisfied," and you shipped a search through a space whose extrema may not exist.

The architectural mistake hiding behind a clean diagram

The pattern is so common it has stopped registering as a design choice. You pair a worker model with a verifier — a separate model, a judge with a rubric, an adversarial critic, sometimes a stricter version of the same base model with a different system prompt — and you let them play catch until the verifier signs off. The diagram is two boxes and an arrow back. It looks elegant. It looks correct. It encodes an assumption the diagram never names: that the verifier is a function. That given the same candidate, it returns the same verdict. That its acceptance set is a stable subset of output space, and the worker's job is to reach it.

The verifier is not a function. It is a stochastic policy. Run the same candidate through it five times with non-zero temperature and you get five judgments that often disagree at the margin. Run it across days as the provider tweaks the model, and you get drift that no version pinning catches because nothing in your pipeline pinned it. Run it on a candidate from round one and the same candidate from round three (now embedded in a longer trace with revision history) and you get different answers because the context changed.

A refinement loop whose convergence test is "this stochastic, non-strictly-monotonic function returned accept" is not converging on quality. It is sampling from a region of output space whose boundary fluctuates round by round. The worker's revisions are not moving toward a fixed point. They are orbiting one.

Why "the agent is getting better" misreads the trajectory

Teams instrument the loop and watch the verifier's confidence score climb across iterations. Round one: 0.62. Round two: 0.74. Round three: 0.81. The graph is monotonic, the chart goes up and to the right, and the natural reading is that the worker is improving and the verifier is recognizing improvement. Ship it.

Two things are likely happening that this reading misses.

First, the verifier is being trained — informally, by the trace — on what the worker considers a serious attempt. The longer the trace, the more it looks like effort. Effort reads as quality to a judge that does not have a ground-truth oracle, and confidence scores rise even when the underlying artifact has not meaningfully changed. You are measuring the verifier's confidence, not the artifact's quality, and the two have decoupled.

Second, the worker is learning the verifier's surface features within the loop. It is not getting closer to a quality artifact; it is getting closer to the rhetorical shape that this verifier accepts. If your verifier likes hedging language, the worker hedges more. If your verifier scores bullet lists higher than prose, the bullet lists multiply. The convergence you are watching is the worker overfitting to the judge, which is a different phenomenon from the worker producing a better output. It looks the same on the dashboard.

The diagnostic that separates these two cases is cheap and rarely run: take the round-five "accepted" output and feed it back through a fresh verifier instance with no trace history. If the fresh verifier rejects it, your loop did not converge to quality. It converged to a local agreement between two specific roll-outs that has no meaning outside the conversation.

The max-iterations cap is a confession

Every production verifier loop you have ever seen has a max_iterations parameter, usually defaulted to three or five. This is not a backstop in case something goes wrong. It is the architectural admission that the loop has no convergence proof. If the loop converged, you would not need a cap; the loop would terminate on its own. The cap exists because the team building the system knows, somewhere, that the loop might not terminate, and they have decided that a hard ceiling on cost is more important than a guarantee on quality.

Read it that way and the implications get sharper. The artifact that hits the user is one of three things. It is an output the verifier explicitly approved (best case, ignoring the verifier-overfitting concern above). It is an output the verifier rejected on the last allowed iteration and the system shipped anyway because the cap fired (silent failure, often the dominant case for hard tasks). It is an output somewhere in the middle that the system picked by some tiebreaker — first accepted, highest score, last produced — none of which were the convergence guarantee the architecture pretended to offer.

The third case is where teams quietly degrade product quality without realizing it. The verifier loop becomes a confidence-laundering machine: it makes the team feel like the system has a quality gate, when what it actually has is a budget gate dressed in a quality gate's clothes.

Patterns that restore termination

The remedies have a shared shape: they make the verifier's behavior independent of the loop dynamics, so the loop is searching against a fixed target rather than a co-moving one.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates