The Rate Limit You Set for Humans an Agent Saturates in Three Seconds
The rate limit was never a fairness primitive. It was a sales-engineering quote that grew up — a number a solutions engineer typed into a docs page during onboarding three years ago, copied into a tier definition, and never revisited because no one ever hit it. The limit said "100 requests per minute" and it meant "more than any sane integration will ever need," because every integration on the platform was a backend service driven by a human at a keyboard, and humans do not type a hundred times a minute.
Then a paying tenant pointed an agent at the endpoint. The agent did not type. It did not pause to read responses. It did not have a UI to render between requests. It executed a planning loop that called the API once per reasoning step, and one reasoning step took the model about thirty milliseconds of wall time to formulate. The agent hit the per-minute ceiling in three seconds, the per-hour ceiling in three minutes, and the daily quota before the on-call engineer's coffee had cooled. The support escalation landed before the throttle dashboard had updated.
