Skip to main content

The Cost Dashboard Your Finance Team Built That Excluded the Embeddings Re-index

· 10 min read
Tian Pan
Software Engineer

Your finance team built a beautiful AI cost dashboard. Token spend, sliced by feature. Embedding spend, sliced by provider. Every quarter, the per-feature pane gets reviewed in a leadership meeting and somebody asks why the support-chat workflow is up 12%, and a product manager has a defensible answer. Every quarter, the per-provider pane gets reviewed in an infra meeting and somebody asks why OpenAI is up 8%, and a platform engineer has a defensible answer. And every quarter, the line that actually doubles your AI bill — the corpus re-index — lands in a third bucket called "infrastructure" that nobody reviews because nobody owns it.

That bucket is where forty percent of your AI spend goes to die unattributed. The teams who could have optimized it never see it. The teams who see it can't tell you which feature it serves. The dashboard is honest about every cost it can explain and silent about the cost it can't, which is exactly the cost that matters most.

The Two Axes Your Dashboard Is Built On

Per-feature cost attribution made sense as the headline view because it answered the question executives actually ask: "what is the support copilot costing us?" The architectural move that made it possible was small — a feature_id tag on every LLM call, propagated through the gateway, summed nightly. Practitioners writing about LLM FinOps in 2026 keep returning to this one decision because it is the single line of code that determines whether per-feature accounting is possible at all.

Per-provider attribution made sense as the second view because the procurement team owns vendor relationships and needs to negotiate against a real number. OpenAI rolls up. Anthropic rolls up. Voyage rolls up. The infra team can compare unit prices, talk to account reps, and decide when to migrate.

Both views are correct. Both views are useful. Neither view sees a quarterly re-index.

The reason is structural: a re-index is not a request. It does not pass through your gateway with a feature_id header. It does not show up in your per-call telemetry. It runs as a batch job — initiated by a platform engineer, often outside business hours, against a corpus owned by no single product team — and it bills as a single fat line item in your provider's invoice on the first of the month. Your per-feature dashboard cannot see it because nothing tagged it. Your per-provider dashboard sees it but cannot say what it was for.

Why the Re-index Is Where the Money Actually Is

The numbers vary by stack but the shape is consistent. Re-embedding a hundred-million-token corpus with a frontier embedding model lands somewhere between five and fifteen thousand dollars in API charges alone, before the compute spent reading, chunking, and writing back to the vector store. Do that quarterly because the corpus drifts, do it again whenever you upgrade the embedding model, do it again whenever you change chunking strategy, and you have a recurring expense that exceeds the entire month-over-month token spend of small features.

Reports from teams who have actually paid these bills describe the same pattern: infrastructure costs grow faster than per-call costs because per-call costs are visible and per-call costs get optimized, while the batch costs are invisible and untouched. Engineers writing about hidden costs of in-house memory systems point out that the embedding API price is negligible — pennies per million tokens — but the operational tax of running the re-index pipeline, the egress out of the vector store, the storage churn, the engineer-weeks of validation, dwarfs the API line by an order of magnitude. The headline number on the OpenAI bill is the smallest part of the actual cost.

When you ask the team that runs the re-index how often it happens, you get an answer like "every quarter, give or take." When you ask why every quarter, you get an answer like "because the model rev came out" or "because the index quality dropped" or "because we wanted to try a smaller embedding dimension." None of those answers are tied to a product roadmap. The re-index cadence is decided by the embedding model release schedule and the patience of the platform engineer who notices recall degrading, neither of which is a budget owner.

The Bucket Named "Infrastructure" Is Where Ownership Goes to Hide

There is a recurring pattern in cloud cost reporting that practitioners describe with weary specificity: the costs that get attributed are the costs that someone has a reason to attribute. Per-feature spend gets attributed because product managers need defensible budgets. Per-provider spend gets attributed because procurement needs leverage in negotiations. Everything else lands in a residual bucket — call it "infrastructure," call it "platform," call it "shared" — that is by construction unowned. The bucket exists because the alternative is admitting that a meaningful fraction of your AI spend has no owner, which is awkward in a board deck.

The unowned bucket has a perverse property: it is the easiest line to optimize and the hardest line to motivate optimizing. A retrieval team that knew its re-index cost would aggressively investigate smaller embedding dimensions, drift-adapter patterns that map new query embeddings into legacy space, lazy re-embedding strategies that defer cost to read time. A retrieval team that does not know its re-index cost reruns the same pipeline every quarter with no review because nobody is asking. The dollars are large; the political pressure is zero. That combination is what compounds into a quietly enormous bill.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates