Annotation Workforce Engineering: Your Labelers Are Production Infrastructure
Your model is underperforming, so you dig into the training data. Halfway through the audit you find two annotators labeling the same edge case in opposite ways — and both are following the spec, because the spec is ambiguous. You fix the spec, re-label the affected examples, retrain, and recover a few F1 points. Two months later the same thing happens with a different annotator on a different edge case.
This is not a labeling vendor problem. It is not a data quality tool problem. It is an infrastructure problem that you haven't yet treated like one.
Most engineering teams approach annotation the way they approach a conference room booking system: procure the tool, write a spec, hire some contractors, ship the data. That model worked when you needed a one-time labeled dataset. It collapses the moment annotation becomes a continuous activity feeding a live production model — which it is for almost every team that has graduated from prototype to production.
