When the Cheap Model Is the Expensive One
A finance team flags that the LLM bill is up 18% this quarter. An engineer pulls the usage dashboard, sees that 70% of traffic now hits the budget model instead of the frontier one, and is briefly confused: the routing change was supposed to cut spend. The per-token price went down exactly as the spreadsheet promised. The bill went up anyway.
This is not a billing error. It is the most common way a cost optimization quietly inverts itself. The spreadsheet that justified the downgrade priced one thing — tokens — and the production system pays for something else entirely: finished tasks. A weaker model does not just produce cheaper tokens. It changes the behavior of every component around it, and those second-order effects land on the same invoice.
The trap is seductive because the first-order math is genuinely correct. A budget model can be 10x to 30x cheaper per token than a frontier model, and for a large fraction of traffic it returns an answer that is indistinguishable in quality. The mistake is not the routing decision. The mistake is measuring the routing decision at the wrong boundary.
