Skip to main content

The 429 Whose Body Said OK And Your Client Believed The Body

· 9 min read
Tian Pan
Software Engineer

The outage started at 14:03 with a 429 from the provider and a JSON body that said {"status": "ok", "data": null}. The client library was written in a hurry six months ago by someone who had been burned twice before — once by a gateway that returned HTTP 200 with an error field, and once by a provider that returned HTTP 500 on a request that had actually succeeded. So the library learned to trust the body, not the status. The status said throttle. The body said proceed. The client believed the body, fired the next request, got another 429 with another ok, fired again, and by 14:11 the provider's circuit breaker had blacklisted the account for the rest of the hour.

The provider hadn't lied, exactly. The 429 was real. But somewhere in the response pipeline a default envelope had been merged over the rate-limit payload — a generic {"status": "ok"} from a wrapper service that filled missing fields, applied on top of an error the wrapper didn't recognize. The status code was correct, the headers were correct, the body was wrong, and the body was the part the client read.

This is the failure mode that hides behind every "we have retries" claim. Retries assume the client can read the error. The error is split across three places — status code, headers, body — and any one of those three can drift independently when a provider ships a change. If your parser pulls signal from only one of them, you'll be wrong the first time the provider's other two slip out of sync.

The Three Truths That Are Supposed To Agree

A well-behaved error response has three parts saying the same thing. The HTTP status code is the coarse signal: 429 means slow down. The Retry-After header is the timing signal: wait this many seconds. The body is the diagnostic signal: here is which limit you hit and why.

The three are supposed to be redundant. In practice they are produced by different layers. The status is set by the gateway. The header is set by the rate-limiter, which may live a hop away from the gateway. The body is set by the application, which may not know a rate limit was hit at all — its handler never ran. When a provider ships a refactor that moves rate limiting from one layer to another, one of the three parts goes stale. The other two carry the truth alone, and the client that was reading the stale one is now flying blind.

Lambda integrations are the classic example. When a Lambda function raises an exception, API Gateway returns HTTP 200 with an X-Amz-Function-Error header and an error body. The HTTP status is success. The header and the body say failure. A naive client that retries on non-2xx will never retry. A naive client that parses response.error will retry forever. Neither is the wrong code in isolation — they are wrong against the specific envelope this specific gateway produces, and the gateway didn't tell anyone in advance.

The same shape recurs everywhere. OpenRouter passes upstream provider errors through as {"error": {"code": 429, "message": "Provider returned error: ..."}} wrapped in an outer HTTP status that may be 502 or 503 depending on which layer caught the failure. The outer status says the gateway is unhappy. The inner code says the model provider is rate-limiting you. A retry policy that reads only the outer status will treat a permanent quota exhaustion as a transient gateway burp and pound the same endpoint.

Why The Client Trusts The Wrong Half

Most clients pick one source of truth because the alternative is writing a decision matrix, and decision matrices are the part of error handling that nobody volunteers for. The picked source is usually the body, because the body is where the application can put structured fields, and structured fields are easier to test against than HTTP semantics.

This works until it doesn't. The body is the part of the response most likely to drift under a refactor. The status code is locked down by load balancers and gateways and HTTP libraries; changing it requires touching infrastructure. The body is owned by whoever wrote the handler, and the handler gets rewritten every quarter. The first refactor that changes the body's shape silently breaks every client that depended on it. The status code, meanwhile, hasn't moved in five years.

Streaming responses make this worse. In Server-Sent Events the HTTP status is set when the connection opens — it can only be 200, because the body hasn't started yet. Errors that happen mid-stream arrive as data events with shapes like {"error": {...}} embedded between content chunks. A client that decides "success" based on the opening status code will accumulate partial output and call the result complete. There are documented cases of SDKs crashing on mid-stream error events because their parser assumed every event would have a choices field, and cases where empty SSE meta-events (retry: directives, comment lines) raise JSONDecodeError and abort the stream halfway through. In each case the HTTP status promised one thing, the body delivered another, and the client had no model for the gap.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates