The Feedback Provenance Gap: Why Your Training Signal Might Not Be What You Collected
Most teams have excellent instrumentation on the feedback capture side. Thumbs-down clicks are logged. Star ratings flow into dashboards. Human annotation jobs write every preference pair to a table. The intake is clean, timestamped, and queryable.
What happens between that capture and the next model update is, for most teams, a black box.
The data gets filtered. Some annotations get weighted higher than others. Rare categories get upsampled. Near-duplicates get dropped. A prompt template change makes last month's labels inconsistent with this month's, but the merge happens anyway. By the time the signal reaches a reward model or fine-tuning job, it has passed through six transformation steps with no audit trail, no version pinning, and no way to trace a degraded model weight back to a specific corruption point in the pipeline.
This is the feedback provenance gap: teams know where feedback enters the system, but not what it becomes before it shapes model behavior.
