Provider Rate Limits Are a Capacity Plan You Never Wrote
The first time your application hits a 429 from a model provider, something important happens, and almost nobody notices it. Not the error itself — the line of code that runs next. Maybe your HTTP client retries with exponential backoff. Maybe it falls back to a smaller model. Maybe it queues the request, or drops it, or surfaces a spinner that never resolves. Whatever it does, that behavior is now your capacity policy. It decides which users get served and which get degraded when demand exceeds supply.
And you almost certainly didn't write it. It was authored by whoever wrote the SDK wrapper, the retry decorator, or the three-line try/except someone copied from a tutorial. The most consequential decision in your system under load — what to do when you can't do everything — is being made by code nobody reviewed as a capacity decision.
This post is an argument for treating that code as what it actually is: a load-shedding policy and a capacity plan. Not an error handler. The 429 is not the problem. The problem is that you have outsourced the design of your system's behavior under contention to a library default.
