Skip to main content

One post tagged with "model-training"

View all tags

Fine-Tuning Data Saturation: When Adding Examples Makes Your Model Worse

· 9 min read
Tian Pan
Software Engineer

There's a pattern that repeats across almost every fine-tuning project that runs past the initial demo: the team hits a quality plateau, decides they need more data, adds 50% more examples, retrains, and discovers the model is either identically mediocre or measurably worse. The instinct to add data is correct for most software problems — more signal generally helps. But fine-tuning has a saturation regime that pre-training does not, and most practitioners don't recognize when they've entered it.

A 2024 study testing LLM fine-tuning on the Qasper dataset found that expanding the training set from 500 to 1,000 examples caused Mixtral's accuracy score to drop from 4.04 to 3.28 and completeness from 3.75 to 2.58. This wasn't a hyperparameter bug. It was data saturation: the model had begun memorizing distribution noise rather than learning generalizable patterns. The team added fuel after the engine had already flooded.