LLM Model Routing Is Market Segmentation Disguised As A Cost Optimization
The cost dashboard makes the case for itself. Sixty percent of traffic is "easy," a quick eval shows the smaller model lands within a couple of points on the global accuracy metric, and the routing layer ships behind a feature flag the same week. The graph bends. Finance is happy. The team moves on.
What nobody tracks is that the customer who hit the cheap path on Tuesday afternoon and the expensive path on Wednesday morning is now using two different products. The two models fail differently. They format differently. They refuse different things. They handle ambiguity, follow-up questions, and partial inputs with different defaults. From the customer's seat, the assistant developed amnesia overnight and nobody can tell them why — because internally, the change was filed as a finops win, not a product release.
