One post tagged with "ground-truth"

When LLMs Review LLMs, Errors Get Laundered Not Caught

May 17, 2026 · 10 min read

Software Engineer

Trace the path of a single quality signal through a modern AI pipeline. An agent drafts a response. A second model reviews it and scores it 9 out of 10. That score gets logged. At the end of the quarter, the logged scores become the new eval set, and the next model is tuned to do well against it. Now ask the obvious question: where in that loop did a human ever look at the actual output?

In a lot of pipelines, the honest answer is nowhere. The agent that does the work is reviewed by another agent, and that reviewer's verdict feeds the next round of evaluation. The loop is closed. It runs continuously, it produces a dashboard, and the dashboard is green. What it does not contain, at any point, is a measurement against reality.

About Tian Pan