The Token Forecast That Mistook a Holiday Trough for the New Baseline
A capacity planner walks into the quarterly budget review with a token forecast built from a clean trailing four-week window. Three of those four weeks happened to span a regional holiday. Daily active sessions were down 40% across that span. The forecast lands 35% under what Q+1 actually consumes, the rate-limit dashboard flatlines red on day one of the new quarter, and the postmortem finds the model behaved exactly as specified — it averaged the most recent four weeks of demand and projected forward. The model was not wrong. The window was.
This is not a story about a bad forecaster. It is a story about treating LLM token spend as if it were the same shape as the EC2 bill it shares a cost center with. The EC2 bill is governed by infrastructure decisions you control: provisioned instances, reserved capacity, scaling policies that respond to load. The token bill is governed by users who decided to take a long weekend. The first is engineering output. The second is consumer demand. A planner who confuses the two will keep building forecasts on windows the calendar guarantees are non-stationary.
