Skip to main content

3 posts tagged with "forecasting"

View all tags

The Cost Forecast Tied to a Pricing Tier You No Longer Qualify For

· 11 min read
Tian Pan
Software Engineer

The usage curve barely moved. The bill went up 38%.

That is the email the finance lead at a mid-sized fintech opened on the first Monday of the quarter. Three months earlier, the engineering org had renegotiated their LLM inference contract and shaved a sizeable percentage off the negotiated unit price by committing to a volume floor. The finance model rolled the new unit price into the FY forecast. Nobody bookmarked the footnote in the pricing schedule that said the discount would lapse if monthly usage fell below the floor for three consecutive months. The seasonal traffic dip in April-May did exactly that. The provider re-tiered the account back to list price. No notification reached engineering, because the notification went to the procurement inbox that nobody had read since the contract was signed.

The Token Forecast That Mistook a Holiday Trough for the New Baseline

· 10 min read
Tian Pan
Software Engineer

A capacity planner walks into the quarterly budget review with a token forecast built from a clean trailing four-week window. Three of those four weeks happened to span a regional holiday. Daily active sessions were down 40% across that span. The forecast lands 35% under what Q+1 actually consumes, the rate-limit dashboard flatlines red on day one of the new quarter, and the postmortem finds the model behaved exactly as specified — it averaged the most recent four weeks of demand and projected forward. The model was not wrong. The window was.

This is not a story about a bad forecaster. It is a story about treating LLM token spend as if it were the same shape as the EC2 bill it shares a cost center with. The EC2 bill is governed by infrastructure decisions you control: provisioned instances, reserved capacity, scaling policies that respond to load. The token bill is governed by users who decided to take a long weekend. The first is engineering output. The second is consumer demand. A planner who confuses the two will keep building forecasts on windows the calendar guarantees are non-stationary.

Inference Cost Forecasting: The Capacity Plan Your Finance Team Wants and You Can't Write

· 12 min read
Tian Pan
Software Engineer

Your finance team will ask for a capacity plan you cannot write. Not because you're inexperienced or because the model is new, but because the two assumptions classical capacity planning rests on — a workload distribution you can measure, and a unit cost stable on a quarter timescale — are both violated by AI workloads. The number you hand them will be wrong on day one, and when the variance hits, the conversation that follows will not be about the bill.

The 2026 State of FinOps report named AI as the fastest-growing new spend category, with a majority of respondents reporting that AI costs exceeded original budget projections — for many enterprises, inference now consumes the bulk of the AI bill. The instinct to manage this with a SaaS-style capacity plan — pick a peak QPS, multiply by a unit cost, add 30% buffer — produces a number with the texture of a forecast and the predictive power of a horoscope. The capacity plan you actually need looks more like a FinOps scenario model than a procurement spreadsheet, and the engineering work to produce it is platform work that competes with feature work until the day finance loses patience.