LoRA Adapter Composition in Production: Running Multiple Fine-Tuned Skills Without Model Wars
The promise sounds clean: fine-tune lightweight LoRA adapters for each specialized skill — one for professional tone, one for JSON formatting, one for medical terminology, one for safety guardrails — then combine them at serving time. Teams ship this design, it works fine in development, and then falls apart in production when two adapters start fighting over the same weight regions and the output quality collapses to something indistinguishable from the untrained base model. Not slightly worse. Completely untuned.
This post is about what happens when you compose adapters in practice, why naive merging fails so reliably, and what strategies actually work at production scale.
