The Rate-Limit Headers Your Provider Returned That Disagreed With The Actual Throttle
The response header said you had 480,000 tokens-per-minute of headroom. The 429 arrived after you spent 240,000. Your scheduler had been autoscaling against a number the runtime was never going to honor, and the burndown chart on the wall was reading the documentation while the throttler was enforcing something else entirely.
This is one of those failures that takes a long time to even notice, because every component along the path is doing exactly what it advertised. The provider returns a header. Your client parses it. Your scheduler reads it. Your dashboard plots it. None of these layers is broken. What is broken is the assumption that the header is a contract.
