"We Got Billed for Peak Usage, Not Average" - Understanding Datadog's Hidden Pricing Model

eng_director_luis · January 30, 2026, 11:53pm

Last quarter, our Datadog bill jumped 340% during a product launch. We were prepared for higher costs - but not that much higher. Here’s what we learned about Datadog’s billing model that nobody explains upfront.

The High-Water Mark Problem

Datadog bills based on the 99th percentile of hosts over the billing period. That means:

Scale up 50 servers for a 2-hour traffic spike? You’re billed for those servers all month.
Spin up test instances for load testing? They count toward your peak.
Auto-scale during an incident? Your bill reflects the crisis, not the resolution.

Real Numbers from Our Launch

Period	Hosts	What We Expected	What We Got Billed
Normal	24	$2,400/mo	-
Launch week	85 (peak)	~$4,000/mo	$8,500/mo
Post-launch	30	Back to normal	Still $8,500

The 99th percentile billing meant our 3-day scaling event defined our entire month’s cost.

The Custom Metrics Multiplier

It gets worse with custom metrics. We had 150 custom metrics at baseline. During launch, our instrumentation for new features pushed us to 400+. Each custom metric is billed at a premium, and guess what - also at 99th percentile.

Why This Punishes Good Engineering

The irony: Datadog’s billing model punishes exactly the behaviors you want:

Elastic scaling = billing spikes
Thorough instrumentation = metric cost explosion
Incident response (more hosts for debugging) = higher bills

What We’re Doing About It

Migrated staging environments to self-hosted OpenObserve
Implemented strict custom metric budgets per team
Evaluating SigNoz for production (running parallel for 2 months)
Standardized on OpenTelemetry so we can switch without re-instrumenting

Has anyone else been surprised by the high-water mark billing? How are you managing observability costs during scaling events?

cto_michelle · January 30, 2026, 11:53pm

Luis, this is exactly the budget planning nightmare I deal with every quarter.

The CFO Conversation Nobody Wants to Have

Try explaining to finance that your observability costs are “unpredictable by design.” The 99th percentile model means:

Annual budgets are essentially guesses
Any growth initiative comes with hidden observability costs
Incident response becomes a billable event

The Procurement Problem

We negotiated an enterprise agreement with Datadog two years ago. Here’s what we learned:

Commit traps: They want annual commitments, but your usage is unpredictable
SKU complexity: 15+ line items makes apples-to-apples comparison impossible
True-up clauses: Exceed your commitment, pay list price for overage

What I Now Require for Observability Vendors

Usage-based pricing with monthly caps
Clear cost attribution to teams/services
No penalty for scaling events
Self-serve cost controls that engineers can implement

The Strategic Shift

We’re now treating observability as critical infrastructure, not a managed service. That means:

Platform team owns the stack
CapEx for infrastructure vs OpEx for SaaS
Predictable costs tied to our infrastructure, not vendor pricing models

The 340% spike you experienced would have triggered an executive review here. We can’t run a business on unpredictable infrastructure costs.

product_david · January 30, 2026, 11:53pm

The product launch scenario Luis described is painfully familiar.

The Launch Day Dilemma

When we launch new features, we need more visibility, not less:

Real-time user behavior analytics
Error rate monitoring at granular levels
Performance metrics for new code paths
A/B test instrumentation

But Datadog’s pricing model creates a perverse incentive: add observability exactly when you can least afford the cost spike.

What This Costs Product Teams

Last quarter, we had to choose between:

Full instrumentation for a new checkout flow ($15K additional/month)
Minimal metrics and hope nothing breaks

We chose option 2. Guess what happened? A payment edge case caused 3% of transactions to fail silently for 4 days before we noticed. The revenue impact was $180K.

The Hidden Tax on Innovation

Every product initiative now includes an “observability budget” line item. This slows down our experimentation velocity because:

Small experiments need cost justification
Quick MVPs skip instrumentation entirely
We’re flying blind on features we should be learning from

What I Want From Observability

Fixed cost per product team, not per metric
Unlimited experimentation without billing anxiety
Cost scales with value delivered, not data volume

The 98% savings from alternatives Michelle mentioned in the other thread would completely change how we approach product observability.

alex_dev · January 30, 2026, 11:54pm

As the person who actually implements the instrumentation, the cost constraints create real friction in my daily work.

The “Is This Metric Worth It?” Tax

Every time I want to add observability, I have to think:

Is this custom metric going to blow our budget?
Should I use a tag or a separate metric? (Tags are cheaper but less flexible)
Can I sample this at 10% and still get useful data?

This cognitive overhead slows down development. Instrumentation should be a reflex, not a cost-benefit analysis.

Real Examples From This Week

Cache hit rate monitoring - Wanted to track per-key hit rates. Finance said no, we’re at our custom metric limit.
API latency by customer tier - Had to remove customer_tier tag because high-cardinality tags trigger premium billing.
Error sampling - We sample errors at 1% to save costs. Last week we missed a bug affecting 0.5% of requests for 3 days.

What Good Developer Experience Looks Like

I’ve been playing with SigNoz in a side project:

Add metrics without checking a budget spreadsheet
High-cardinality labels? No problem
Full traces for debugging, not sampled
Actually enjoyable to instrument code

Luis, when you mentioned standardizing on OTel - that’s the key. My instrumentation code works with any backend. When we switch (not if), it’s just a config change.