The 20% Problem in Model Routing: When Cost Optimization Creates Second-Class Users
Your routing system works exactly as designed. Eighty percent of queries go to the cheap model; twenty percent escalate to the capable one. Latency is down, costs dropped by 60%, and leadership is happy. Then someone pulls the data by user segment, and you see it: users writing in non-native English are escalated at half the rate of native speakers, and their satisfaction scores are 18 points lower. The routing system treated the query complexity signal as neutral, but it wasn't — it was a proxy for language proficiency, and you've been giving a systematically worse product to a specific group of users for months.
This is the 20% problem. It's not a bug in the router. It's an emergent property of any cost-optimized routing system that nobody measures until it's too late.
