The Token Forecast That Mistook a Holiday Trough for the New Baseline

June 2, 2026 · 10 min read

Software Engineer

A capacity planner walks into the quarterly budget review with a token forecast built from a clean trailing four-week window. Three of those four weeks happened to span a regional holiday. Daily active sessions were down 40% across that span. The forecast lands 35% under what Q+1 actually consumes, the rate-limit dashboard flatlines red on day one of the new quarter, and the postmortem finds the model behaved exactly as specified — it averaged the most recent four weeks of demand and projected forward. The model was not wrong. The window was.

This is not a story about a bad forecaster. It is a story about treating LLM token spend as if it were the same shape as the EC2 bill it shares a cost center with. The EC2 bill is governed by infrastructure decisions you control: provisioned instances, reserved capacity, scaling policies that respond to load. The token bill is governed by users who decided to take a long weekend. The first is engineering output. The second is consumer demand. A planner who confuses the two will keep building forecasts on windows the calendar guarantees are non-stationary.

The window assumption is doing more work than the model

Every trailing-window forecaster — moving average, exponential smoothing, ARIMA fit on the last N points, even an LLM-based foundation forecaster you handed the last month to — encodes the same implicit claim: the distribution that generated this window is the distribution that will generate the next one. That assumption is what licenses extrapolation. It is also what the calendar is constantly violating.

The calendar violates it in obvious ways and subtle ways. The obvious ones: a national holiday week, a school summer break, a regional shutdown around Lunar New Year. The subtle ones: the trailing four weeks happened to include two product launches and one outage; the trailing four weeks happened to be the run-up to a renewal cycle when usage spikes; the trailing four weeks were the post-launch valley after a marketing push. In each case, a forecaster that assumes stationarity sees noise where the data is actually a known shock.

The pathology shows up most clearly when the window length and the shock length are comparable. A one-day holiday inside a 28-day window perturbs the mean by a percentage point. A holiday week inside a four-week window perturbs the mean by 25%. A three-week regional holiday inside the same window — Lunar New Year in many Asia-Pacific markets, August in much of Europe — and the window has more holiday than baseline. The forecaster is no longer averaging trend plus seasonal noise; it is averaging the trough and projecting it forward as the new normal.

Why LLM spend behaves like consumer demand and not like infra cost

A traditional infra capacity plan can usually get away with a trailing window because the cost it is forecasting is decoupled from user behavior in the short term. You provisioned 100 instances; they cost the same on Monday as on Saturday. The bill responds to autoscaling, but autoscaling responds to load with a lag, and the marginal cost of an idle instance is bounded by its hourly rate. The variance is small compared to the mean.

LLM token spend has the opposite shape. The marginal cost of an idle session is zero — you do not pay for the model when nobody is calling it — but the marginal cost of an active session is a function of how many tokens that session generates, which is a function of how the user prompted it. There is no provisioned floor absorbing the variance. Every fluctuation in session volume, session length, prompt verbosity, and context window depth shows up at full amplitude in the daily bill. A 40% drop in daily active sessions is a 40% drop in spend. There is no instance you forgot to deprovision smoothing the signal out.

The implication for forecasting is uncomfortable: the right reference class for LLM capacity planning is not server capacity planning. It is retail demand planning. Retailers have spent decades learning that you cannot forecast holiday quarters from non-holiday quarters or vice versa, that moving holidays like Lunar New Year and Easter wreck month-over-month comparisons, and that a baseline model trained on a trough will systematically underestimate the peak. The literature on holiday-effect modeling in demand forecasting is not optional reading for the team running the LLM budget; it is the manual.

The forecaster does not know it is reading a trough

The most dangerous version of this failure mode is the one where nothing in the forecast output reveals that the input window was anomalous. The forecaster ingests a time series, fits a model, projects forward, and emits a number. The number does not come with a confidence band wide enough to admit that the window contained a known external event. The planner sees a clean line and a tight range and walks into the budget review with conviction.

Confidence intervals from naive trailing-window methods are computed against the variance observed inside the window, which during a holiday trough is unusually low precisely because everyone is doing the same thing — not using the service. The interval narrows when it should widen. You get a forecast with high apparent precision and zero actual coverage of the regime change that is about to hit.

This is the same statistical sin retailers have spent decades learning to avoid: trusting a forecast whose error model was estimated on a window the holiday calendar already told you was unrepresentative. The fix is not a better fit. The fix is an external signal that overrides the window when the window is known to be misleading.

Year-over-year baseline overlay: anchor against a comparable period

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Token Forecast That Mistook a Holiday Trough for the New Baseline

The window assumption is doing more work than the model

Why LLM spend behaves like consumer demand and not like infra cost

The forecaster does not know it is reading a trough

Year-over-year baseline overlay: anchor against a comparable period

Recommended Reading

About Tian Pan

The window assumption is doing more work than the model​

Why LLM spend behaves like consumer demand and not like infra cost​

The forecaster does not know it is reading a trough​

Year-over-year baseline overlay: anchor against a comparable period​

Recommended Reading

About Tian Pan

The window assumption is doing more work than the model

Why LLM spend behaves like consumer demand and not like infra cost

The forecaster does not know it is reading a trough

Year-over-year baseline overlay: anchor against a comparable period