Skip to main content

One post tagged with "ground-truth"

View all tags

When LLMs Review LLMs, Errors Get Laundered Not Caught

· 10 min read
Tian Pan
Software Engineer

Trace the path of a single quality signal through a modern AI pipeline. An agent drafts a response. A second model reviews it and scores it 9 out of 10. That score gets logged. At the end of the quarter, the logged scores become the new eval set, and the next model is tuned to do well against it. Now ask the obvious question: where in that loop did a human ever look at the actual output?

In a lot of pipelines, the honest answer is nowhere. The agent that does the work is reviewed by another agent, and that reviewer's verdict feeds the next round of evaluation. The loop is closed. It runs continuously, it produces a dashboard, and the dashboard is green. What it does not contain, at any point, is a measurement against reality.