Rate Limit Hierarchy Collapse: When Your Agent Loop DoSes Itself
The bug report says the service is slow. The dashboard says the service is healthy. Token-per-minute usage is at 62% of the tier cap, well inside the green band. Then you open the traces and see the shape: one user request spawned a planner step, which emitted eleven parallel tool calls, four of which were search fan-outs that each triggered sub-agents, which each called three tools in parallel — and that single "request" is now pounding your own token bucket from forty-seven different workers at the same time. The other ninety-nine users of your product are stuck behind it, getting 429s they never earned. Your agent is DoSing itself, and the rate limiter is doing exactly what you told it to.
This is rate limit hierarchy collapse. You bought a perimeter defense designed for HTTP APIs where one request equals one unit of work, then wired it in front of a system where one request means a tree of unknown depth and unbounded branching factor. The single-bucket model doesn't just fail to protect — it fails invisibly, because your aggregate numbers never breach anything. The damage happens in the tails, in correlated bursts, and in the heads-down users who happen to be adjacent in time to a heavy one.
