98% Cost Savings: Why Teams Are Migrating Off Datadog in 2026

When Coinbase’s $65 million Datadog bill went viral on Hacker News, it struck a nerve. But here’s the thing - that’s not an outlier.

Mid-sized companies routinely spend $50,000-$150,000 per year on Datadog. Enterprise deployments easily exceed $1 million annually once APM, logs, and RUM are included.

And for many organizations, observability has become the second-highest cost after infrastructure itself.

Something has to give.

The Pricing Breakdown That’s Breaking Budgets

Datadog’s pricing model has several features that combine to create unpredictable bills:

  1. High-water mark billing - You’re billed based on your 99th percentile usage, not your average. That one incident response where you scaled up? You’re paying for it all month.

  2. Dual-cost log management - You pay once to ingest logs, then pay again (at a higher rate) to index them. Want searchable logs? Double the cost.

  3. Custom metrics tax - Premium rates based on unique tag cardinality. OpenTelemetry’s rich tagging model? That’s expensive on Datadog.

  4. Per-host + per-feature stacking - Infrastructure monitoring, APM, Synthetics, RUM - each adds to the bill with different pricing dimensions.

Concrete Cost Comparisons

Teams running identical workloads have documented these comparisons:

Solution Monthly Cost Savings vs Datadog
Datadog $22,303 -
Grafana Cloud $1,855 11x cheaper
Coroot (on-prem) $142-162 140x cheaper
OpenObserve $90 98% savings

These aren’t promotional numbers - they’re real comparisons from engineering teams.

Why OpenTelemetry Changes Everything

The game-changer is OpenTelemetry becoming the instrumentation standard. When your application emits OTel data, you can send it anywhere:

  • Today: Datadog
  • Tomorrow: SigNoz, OpenObserve, Grafana, or whatever works better

Your instrumentation investment is protected. The switching cost drops from “re-instrument everything” to “change the exporter configuration.”

If you’re still using Datadog’s proprietary agents, you’re building vendor lock-in into your infrastructure.

What’s Actually Happening

Teams I talk to are following similar patterns:

  1. Standardize on OpenTelemetry - Replace Datadog agents with OTel collectors
  2. Pilot alternatives - Run a parallel backend for non-critical services
  3. Validate parity - Ensure dashboards and alerts can be recreated
  4. Migrate gradually - Move service by service, not big bang

The migration is real, and it’s accelerating.

The Uncomfortable Question

If you’re spending significant budget on Datadog, you need to ask: what would it take to switch? And what’s the cost of waiting another year?

What’s your observability spend looking like?

The high-water mark billing nearly killed us last year.

We had a production incident that required spinning up additional capacity for about 4 hours. Incident resolved, capacity scaled back down. Business as usual.

Then the Datadog bill came.

We got charged for the peak capacity for the entire month. Four hours of incident response turned into a ~$15K surprise on the bill.

The conversation with finance was not pleasant.

What we’ve learned since:

  1. Always scope before scaling - Before spinning up capacity, consider the observability cost impact. Yes, this is absurd. Yes, we do it anyway.

  2. Tag cardinality is your enemy - Every unique combination of tags is a separate timeseries. Kubernetes labels? Pod names? Request IDs? They all multiply your metrics bill.

  3. Log sampling is mandatory - We moved to 10% log sampling on high-volume services. Not because we wanted less data, but because we couldn’t afford 100%.

  4. The calculator lies - Datadog’s pricing calculator gives you one number. Reality gives you another. Plan for 40-60% above the estimate.

The OTel transition:

We’re now 6 months into standardizing on OpenTelemetry. The instrumentation work was significant, but we can finally see a path to alternatives.

For anyone starting this journey: the OTel collector is your friend. Centralize there first, then you can redirect anywhere.

The security implications of switching observability vendors are significant but manageable with the right approach.

Data Residency Concerns

With Datadog, your telemetry data lives in their cloud infrastructure. Moving to self-hosted alternatives like Coroot or OpenObserve gives you complete control over where sensitive operational data resides. For organizations in regulated industries, this can actually improve your compliance posture.

Vendor Risk Assessment

Any migration requires evaluating:

  • Data retention policies and deletion guarantees
  • SOC 2 Type II compliance status
  • Incident response and breach notification procedures
  • API security and authentication mechanisms

The OpenTelemetry Security Advantage

One underappreciated benefit: OTel instrumentation runs in your environment, giving you complete control over what data leaves your network. With proprietary agents, you’re trusting the vendor’s data collection code.

Migration Security Checklist

  1. Audit current data flows and retention
  2. Verify new platform meets compliance requirements
  3. Plan credential rotation strategy
  4. Test data export and deletion capabilities
  5. Document chain of custody during migration

The 98% cost savings Michelle mentioned are compelling, but make sure your security team is involved from day one of any migration planning.

From a data infrastructure perspective, the Datadog cost problem is fundamentally about data volume economics.

The Data Volume Reality

Modern ML pipelines generate massive telemetry. A single model training run can produce gigabytes of metrics and logs. When you’re paying Datadog’s per-GB ingestion rates plus indexing fees, the math becomes prohibitive fast.

Why ClickHouse Changes Everything

Both SigNoz and OpenObserve use ClickHouse for storage, and the compression ratios are genuinely impressive - 10-140x depending on data patterns. For observability data which is highly repetitive, this translates directly to cost savings.

Our ML Team’s Experience

We had to make painful choices about what to instrument:

  • Full traces only for production inference, not training
  • Sampling at 1% for high-volume endpoints
  • Custom metrics limited to top 50 most critical

These compromises undermined our ability to debug model performance issues. With 98% cost savings, we could actually instrument everything.

The Hidden Cost of Data Science Tooling

Datadog’s APM for Python is solid, but their ML-specific features lag behind. Moving to open-source means we can integrate directly with MLflow, Weights & Biases, and our existing Prometheus metrics.

Michelle’s point about OpenTelemetry is crucial for data teams - it means we can emit traces from Spark jobs, Airflow DAGs, and model serving without vendor-specific instrumentation.