After three years of rejected proposals, I finally got approval for a significant observability investment. Here’s the approach that worked - and the mistakes I made along the way.
The Failed Approaches
Attempt #1: The Technical Case (Rejected)
“We need better observability for faster debugging and improved reliability.”
Finance response: “How does that translate to dollars?”
Attempt #2: The Fear Case (Rejected)
“Without this investment, we risk major outages and security incidents.”
Finance response: “We’ve operated fine so far. What’s actually changed?”
Attempt #3: The Benchmarking Case (Deferred)
“Competitors are investing in observability. We need to keep up.”
Finance response: “Interesting. Come back with specifics.”
The Approach That Worked
Step 1: Baseline Current Costs
I partnered with Finance to document every hour spent on incidents over 6 months:
| Cost Category | Monthly Hours | Loaded Cost | Annual Impact |
|---|---|---|---|
| Incident response (engineers) | 240 hrs | $150/hr | $432,000 |
| War room participation (leadership) | 40 hrs | $300/hr | $144,000 |
| Customer support escalations | 80 hrs | $75/hr | $72,000 |
| Sales cycle delays (due to reliability concerns) | - | - | $500,000 est. |
| Total | - | - | $1,148,000 |
Step 2: Establish the Benchmark
Used industry data to set realistic improvement targets:
- Splunk research: 2.6x ROI for observability leaders
- New Relic: $2 return per $1 invested (median)
- Lenovo case study: 85% MTTR reduction
Our target: 50% reduction in incident response time (conservative)
Step 3: Build the Business Case
Current annual incident cost: $1,148,000
Target reduction (50%): $574,000
Proposed investment: $350,000/year
-----------------------------------------
Net annual benefit: $224,000
ROI: 64% first year
Payback period: 7.3 months
Step 4: Address the Objections
“Why can’t we just hire more engineers?”
- Showed that $350K observability investment = 2 senior engineers
- But observability multiplies existing team effectiveness
- Industry data: 90% reduction in troubleshooting time (IBM Instana)
“What if the improvements don’t materialize?”
- Proposed quarterly ROI reviews
- Defined specific metrics we’d track
- Committed to adjusting investment based on results
“Why this vendor/solution?”
- Prepared comparison matrix with 3 alternatives
- Showed OpenTelemetry portability as risk mitigation
- Included migration cost estimates if we needed to switch
The Presentation That Got Approved
Slide 1: “We’re spending $1.1M annually on firefighting”
Slide 2: “Industry leaders see 2.6x ROI from observability investment”
Slide 3: “Our proposal: $350K investment, $574K savings, 64% ROI”
Slide 4: “Quarterly checkpoints to validate results”
Lessons Learned
- Partner with Finance early - They helped me understand what “ROI” actually means to them
- Use their data - Pulled incident costs from existing time tracking, not estimates
- Be conservative - Underpromise on benefits, then overdeliver
- Show the exit - OpenTelemetry meant we weren’t locked in
- Offer accountability - Quarterly reviews gave them confidence
What I Wish I’d Known Earlier
The technical value was never the issue. Finance needed to see:
- Current state costs (documented, not estimated)
- Industry benchmarks (credible third-party sources)
- Conservative projections (with clear assumptions)
- Risk mitigation (what if it doesn’t work?)
- Accountability mechanism (how will we measure success?)
Who else has successfully navigated the budget approval process? What approaches worked for your organization?