Your Model Router Is a Load Balancer That Cannot See the Load
A load balancer in front of a web fleet works because every machine reports back: CPU, queue depth, error rate, latency. The balancer reads the load and routes accordingly. A model router does not get that telemetry. It decides which model handles a query by looking only at the query, before the model has done anything. The router predicts difficulty from the prompt. Real difficulty only shows up in the answer. By the time the signal exists, the routing decision is already three seconds old and the cheap model has already shipped a confident, wrong reply to your user.
This is the structural defect at the center of model routing, and most teams ship a router without ever framing it this way. They frame it as a classifier — train a model to label queries as "easy" or "hard," validate it on a held-out set, ship when accuracy clears 90%. The classifier metaphor is wrong in a way that matters. A classifier predicts a label that already exists. The router is predicting a label that does not exist yet, will not exist until the routed model has answered, and may never exist in a form clean enough to learn from.
