The Fine-Tune That Erased the Alignment You Inherited
You picked the base model "because it was the safer one." Six months later your team has shipped a domain-tuned checkpoint that answers customer questions about wealth products with reassuring fluency, passes the task eval at 94%, and — somewhere between epoch one and epoch four — quietly forgot how to refuse anything. Nobody noticed because your launch eval suite never measured what fine-tuning removed. The capabilities it stripped were never in your task distribution, so they were never on the dashboard.
This is the most under-reported failure mode in production LLM systems right now: post-training alignment is not a property of a model family. It is a property of one specific checkpoint, and supervised fine-tuning corrodes it by default. The team that fine-tuned has not shipped a tuned version of the model they reviewed. They have shipped a different model — one whose model card describes weights nobody is serving.
