The Fine-Tune Dataset You Accidentally Built While Debugging
The thumbs-down button on your staging UI was supposed to do one thing: tell the on-call engineer which response looked bad so they could go investigate. Six months later, somebody on the modeling team pulled "all production feedback with corrections attached" into a Parquet file and ran an SFT job against it. The eval set improved on three metrics and regressed quietly on five. Nobody could explain why until somebody scrolled through the labels and found a row that read, in the corrections column, "this is fine but I hate how it phrases it." The model learned that opinion. Then it learned forty-thousand more of them.
This is the failure mode where the debugging surface and the curation surface are the same surface. Engineers click "bad" because something is broken, because something looks weird, because they were about to file a ticket, because the formatting offends them, because they were checking whether the button works. The signal that flows out of that click is a mix of "this output is wrong," "this output is right but ugly," "I don't like this," and "I was bored." Treated as a single label, it certifies nothing. Trained against, it teaches the model the union of all those moods.
