The Nightly Batch That Starved Your Interactive Traffic After a Quota Window Rewrite
A cron job that ran cleanly for ten months is the most dangerous job in your system, because nothing in it changed and nothing in your code changed and the only thing that did change was a sentence in someone else's release notes that nobody on your team reads. The nightly embedding refresh that kicked off at 00:05 UTC every night, drained its work queue in under ten minutes, and went back to sleep was textbook. It coexisted with daytime interactive traffic by occupying the freshly-reset minute quota for a few minutes before users woke up, and by staying well under the daily allotment for the rest of the day. Then the provider rewrote how the daily window was accounted, kept the minute window unchanged, and left every signature your client tested against intact. The batch kept running clean. The interactive surface started returning 429s at 00:13 UTC every night. The team chased an upstream maintenance window that wasn't happening for a week.
The bug was never in your code. The bug was that "a daily limit" stopped meaning what it had meant the day before, and your scheduler was pinned to a wall-clock boundary that aligned with the old meaning. This post is about rate-limit accounting as a contract the provider can revise without breaking any signature, about how two independently-correct schedules compose into a denial-of-service pattern, and about the architectural moves that make a cron job stop being a time bomb wired to someone else's clock.
