Skip to main content

3 posts tagged with "rate-limits"

View all tags

The Nightly Batch That Starved Your Interactive Traffic After a Quota Window Rewrite

· 11 min read
Tian Pan
Software Engineer

A cron job that ran cleanly for ten months is the most dangerous job in your system, because nothing in it changed and nothing in your code changed and the only thing that did change was a sentence in someone else's release notes that nobody on your team reads. The nightly embedding refresh that kicked off at 00:05 UTC every night, drained its work queue in under ten minutes, and went back to sleep was textbook. It coexisted with daytime interactive traffic by occupying the freshly-reset minute quota for a few minutes before users woke up, and by staying well under the daily allotment for the rest of the day. Then the provider rewrote how the daily window was accounted, kept the minute window unchanged, and left every signature your client tested against intact. The batch kept running clean. The interactive surface started returning 429s at 00:13 UTC every night. The team chased an upstream maintenance window that wasn't happening for a week.

The bug was never in your code. The bug was that "a daily limit" stopped meaning what it had meant the day before, and your scheduler was pinned to a wall-clock boundary that aligned with the old meaning. This post is about rate-limit accounting as a contract the provider can revise without breaking any signature, about how two independently-correct schedules compose into a denial-of-service pattern, and about the architectural moves that make a cron job stop being a time bomb wired to someone else's clock.

Your Eval Suite Is a Production Workload: When Nightly Tests Starve Live Traffic

· 11 min read
Tian Pan
Software Engineer

A team's most successful AI feature went dark at 2:14 AM on a Tuesday. The pager said the model API was returning 429s in steady state. The model was healthy. The provider was healthy. The team's own production traffic was nominal. What was eating the quota was the nightly eval suite — the same suite the team had been proudly expanding the previous week. The eval and the product shared an organization key, and on that night the eval was the noisy neighbor that broke its own roommate.

The eval wasn't misbehaving. It was doing exactly what its authors designed: a thousand cases against the production model identifier, on a cadence, on a schedule everyone had forgotten about because it had been quiet for two years. The expansion that finally pushed it over the limit added three hundred cases. The PR was reviewed by the eval owner and the prompt owner. Nobody on the review thread thought to ask: how much of the daily token quota does this consume?

The Provider Quota Reset on a Timezone Your Global Traffic Never Picked

· 8 min read
Tian Pan
Software Engineer

Your monthly token quota resets at 00:00 UTC. Your largest customer is in Tokyo and hits peak load at 21:00 UTC — 6:00 AM their next morning. By the time the reset arrives, the Tokyo workday has already chewed through the last six hours of the cycle on quota-exhaustion fallback. The 429s look "occasional" because the UTC calendar axis on your dashboard hides the daily reset boundary inside an ordinary timestamp.

This is not a rate limit bug. It is a calendar bug. The provider chose a reset clock for their bookkeeping convenience, and the geography of your traffic decided which customers got the empty end of the cycle. The team that priced the quota as a uniform resource is rationing it on a calendar the user never sees.