Skip to main content

The Nightly Batch That Starved Your Interactive Traffic After a Quota Window Rewrite

· 11 min read
Tian Pan
Software Engineer

A cron job that ran cleanly for ten months is the most dangerous job in your system, because nothing in it changed and nothing in your code changed and the only thing that did change was a sentence in someone else's release notes that nobody on your team reads. The nightly embedding refresh that kicked off at 00:05 UTC every night, drained its work queue in under ten minutes, and went back to sleep was textbook. It coexisted with daytime interactive traffic by occupying the freshly-reset minute quota for a few minutes before users woke up, and by staying well under the daily allotment for the rest of the day. Then the provider rewrote how the daily window was accounted, kept the minute window unchanged, and left every signature your client tested against intact. The batch kept running clean. The interactive surface started returning 429s at 00:13 UTC every night. The team chased an upstream maintenance window that wasn't happening for a week.

The bug was never in your code. The bug was that "a daily limit" stopped meaning what it had meant the day before, and your scheduler was pinned to a wall-clock boundary that aligned with the old meaning. This post is about rate-limit accounting as a contract the provider can revise without breaking any signature, about how two independently-correct schedules compose into a denial-of-service pattern, and about the architectural moves that make a cron job stop being a time bomb wired to someone else's clock.

A daily quota is not a noun, it is an accounting policy

When a provider tells you the daily limit is one million tokens, they have not told you what "daily" means. They have told you a number. The number is a budget; the policy that decides when the budget refills is a separate object, and the documentation often leaves the second one implicit. Two providers can both say "one million tokens per day" and one of them resets the counter at 00:00 UTC and the other one tracks a rolling twenty-four-hour bucket where every token you spent at 03:00 yesterday frees up at 03:00 today. Your application sees the same headline number. Your batch sees two completely different worlds.

Both providers in 2026 publish rolling-window documentation on the minute-level RPM and TPM ceilings — sixty seconds of continuously-sliding accounting where every request leaves a smaller footprint as time passes. The daily ceilings, the per-organization spend caps, and the burst allowances are where the policy ambiguity lives. Some are calendar-aligned. Some are rolling. Some used to be one and quietly became the other. A few are billed against a credit pool that itself replenishes on the calendar, like Anthropic's June 2026 shift to a separate monthly credit for automated workflows that draws down independently of the message-window allowance interactive users see.

Treat the unit as a policy, not a noun. The contract is not the daily number. The contract is the daily number plus the rule that decides when it refills, and you only own one of those.

Why two correct clocks make a wrong system

Your cron and the provider's quota meter are both reasonable on their own. Yours fires at 00:05 UTC because the upstream data partition is guaranteed to be settled by then and earlier was producing partial refreshes. The provider's minute window resets at the top of every minute because aligning to wall-clock minutes is the cheapest way to operate a global rate-limit metering tier. Each clock independently produces a useful behavior. They were not designed to be composed.

The composition only worked because both clocks defined "daily" the same way. When the provider tightened daily accounting into a rolling twenty-four-hour bucket while leaving the minute window calendar-aligned, the two clocks stopped being symmetric. The batch consumed the minute-quota generously and drained the rolling daily allotment in the first eight minutes of its run, because all of those tokens were now spent against the next twenty-four hours instead of against today. From 00:13 onward the daily bucket was empty for every other workload sharing the key. Eight minutes of batch did not consume eight minutes of capacity; it consumed twenty-four hours of capacity.

The 429s told you nothing useful, either. They pointed at the synchronous surface because that was the surface still trying to make requests. The batch reported a successful run because it had finished its work before the bucket emptied. The two systems that knew the truth disagreed about whose problem it was: the batch said "I am fine," the interactive surface said "the upstream is broken," and no single dashboard saw both as facets of one event.

The implicit-default failure mode and its cousins

This pattern has a family. The collision between your cron and the provider's reset is one instance of a broader bug class: behavior that depends on a value the provider owns the right to change, where your client never asserted the value and your tests never verified it. The implicit max_tokens default, the silent tokenizer upgrade, the auto-router that re-tunes its complexity classifier — all of these are the same shape. You signed a contract whose surface is the JSON request and response, but the contract's behavior depends on rules that live behind the surface and the provider revises them on their cadence.

Rate-limit accounting is the most dangerous member of the family because it composes with time. A change to a response field shows up the moment the next request lands. A change to how a quota window is counted shows up only when the workload pattern that exploited the old semantics runs again. If your batch fires nightly and the provider rolled out the change on a Tuesday, you have until Wednesday at 00:05 UTC to notice. The first night looks like noise. The second night looks like a pattern. By the fourth night the on-call rotation is hunting an upstream maintenance window because that is the shape the symptom has, and the actual root cause is one sentence in a release-notes file that filed it under "improved daily-usage smoothing."

A useful lens here: the shared-key starvation pattern shows the same dynamic in space, where parallel workloads on one key starve each other through the minute window. The cron collision is the temporal analogue, where the same workload starves itself across time. Both share the underlying flaw — a single quota pool that has no notion of which workload deserves which fraction of the budget.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates