I need to share this while it’s still fresh. We just got leadership approval to migrate from self-hosted Backstage to a managed platform (Roadie). The decision took 18 months longer than it should have.
Here’s what our original ROI calculation completely missed.
The Setup
- Team: 4-person platform team
- Context: 150 engineers, growing fast
- Timeline: Started self-hosted Backstage implementation Q2 2024
- Initial ROI logic: Save $200K/year in licensing costs by building ourselves
The math looked great. Leadership approved. We went all-in.
The Reality: Maintenance Devoured Everything
We thought we’d spend maybe 30-40% of our time on maintenance, 60-70% on building new capabilities.
Reality over 18 months:
- ~200 hours/month on maintenance
- ~50 hours/month on new features
That’s an 80/20 split. Inverted from what we expected.
Maintenance Breakdown
Monthly Backstage core upgrades (40 hours):
- Backstage releases monthly
- Breaking changes are common
- Migration guides exist but aren’t always complete
- Test everything after upgrade (catalog, plugins, auth, RBAC)
Plugin updates and compatibility (30 hours):
- Core upgrade often breaks plugins
- Some plugins lag behind core releases
- Fork and maintain plugins ourselves (catalog pagination, custom integrations)
- Verify compatibility across our 15+ plugins after each core update
Troubleshooting and developer support (60 hours):
- “Portal is slow” (again)
- “Can’t find my service in catalog” (sync issues)
- “Deployment button doesn’t work” (permission config)
- “Documentation is stale” (always)
Infrastructure maintenance (30 hours):
- Database migrations after Backstage upgrades
- Scaling workers for catalog ingestion
- Monitoring and alerting refinements
- Cost optimization for our AWS bill
New feature requests from developers (40 hours):
- “Can we add Datadog integration?”
- “Can we show cost per service?”
- “Can we customize the homepage layout?”
- These sound simple, but each requires plugin dev, testing, deployment
Total: ~200 hours/month = 50 hours/week = 1.25 FTEs
And we’re a 4-person team. That means ~30% of our capacity is just keeping the lights on.
But wait, it gets worse.
The Invisible Cost: Opportunity Cost
While we maintained Backstage internals, we didn’t build the capabilities that would actually improve developer productivity:
What We Delayed (By 6-12 Months Each)
-
Service onboarding automation: Developers still manually creating repos, setting up CI/CD, configuring monitoring. We delayed golden path templates by 9 months because we were stuck debugging React component rendering issues.
-
Cost visibility dashboard: Finance kept asking “which services cost the most to run?” We had the data, couldn’t build the UI because frontend capacity was consumed by Backstage maintenance.
-
Deployment automation: Developers still manually triggering deployments, no self-service. We delayed building this by 7 months because we had to rebuild catalog pagination from scratch (the default plugin didn’t scale).
-
Security compliance automation: Still manual checks for container scanning, secrets detection, policy enforcement. Delayed 11 months while we dealt with RBAC implementation bugs.
The Productivity Loss Calculation
Before platform improvements, developers waste ~5-6 hours/week on:
- Finding documentation (scattered, stale)
- Waiting for approvals (no automation)
- Manual deployments (no self-service)
- Debugging environment issues (inconsistent setups)
If our platform features had shipped on time, we’d reduce that to ~1-2 hours/week.
Lost productivity: 4 hours/week × 150 engineers × $75/hour × 48 weeks × (delays of 6-12 months average 9 months) = $780,000 in lost productivity while we delayed features.
That’s 4x our annual “savings” from not paying for a managed platform.
The TypeScript Skills Gap Was Brutal
Our platform team background:
- Python, Go, Terraform, Kubernetes

- React, TypeScript, frontend build tooling

Backstage is heavily frontend. We struggled:
Hiring: Tried to hire a frontend-focused platform engineer. Took 8 months to fill. That’s almost a year of frontend work backlog.
Contracting: Brought in a React contractor at $180/hour. Good work, but coordination overhead and knowledge transfer ate time. Added ~$90K annually (not in original budget).
Learning curve: Tried to upskill our existing team. Noble effort, but infrastructure engineers learning React while shipping production features? 3-6 months of reduced productivity.
This skill gap wasn’t in our TCO model. Add another $200K+ in hidden costs.
What We Got Right (Give Us Some Credit)
Not everything was a disaster:
Documentation culture: We treated internal platform docs as product docs. This actually worked well.
Platform as product mindset: Surveyed developers quarterly, tracked adoption metrics, prioritized based on impact.
Strong observability: Instrumented everything, could debug issues quickly when they came up.
Security posture: Full control over auth, data residency, compliance. This was actually valuable for our use case.
What We’d Do Differently: Hybrid Model
If I could go back, here’s what I’d pitch:
Buy foundation: Roadie, Humanitec, or Port for portal, catalog, auth, RBAC
Build differentiation: Golden paths, custom integrations, org-specific automation
We’d spend engineering time on:
- Service templates that encode our architecture patterns
- Deployment automation that integrates with our tools
- Cost dashboards with our specific business logic
- Security compliance automation with our requirements
Instead of:
- Debugging React hooks
- Implementing RBAC from scratch
- Maintaining database migrations
- Keeping plugin compatibility matrix updated
Platform engineers should solve organizational problems, not framework problems.
The Real ROI Calculation
Original calculation:
- DIY: $0 platform cost + 4 engineers = $600K/year
- Buy: $200K platform cost + 2 engineers = $500K/year
- Conclusion: DIY is more expensive but gives control
Actual calculation:
- DIY: $600K/year + $90K contractors + $780K lost productivity = $1.47M/year
- Buy: $200K platform + 3 engineers building capabilities = $650K/year + $0 lost productivity
- Net difference: $820K/year in favor of buy
We spent 18 months learning we should have spent $200K.
The Migration Decision
Last month, we finally got approval to migrate to Roadie. Leadership question: “Why didn’t you tell us sooner?”
Honest answer: Sunk cost fallacy. We’d invested so much, felt like we could make it work. “Just one more quarter and we’ll get ahead of maintenance.”
We never did.
For Those Evaluating This Decision
Ask yourself:
-
Maintenance burden: Are you accounting for actual monthly Backstage upgrade cycles, plugin compatibility, and developer support? Or using a guess?
-
Opportunity cost: What platform capabilities will you not build while maintaining framework code?
-
Skills match: Does your platform team have strong React/TypeScript skills? If not, what’s your plan?
-
Decision checkpoints: If maintenance exceeds X% of team time after 6 months, will you reevaluate? Or will sunk cost fallacy keep you committed?
The build vs buy decision isn’t just about upfront costs. It’s about where your engineering capacity goes and what you don’t build as a result.
We learned this the expensive way. You don’t have to.
Update: Migration to Roadie is underway. Timeline: 6-8 weeks. Cost: ~$100K one-time migration + $120K/year platform cost. We’re freeing up 3 FTEs to focus on golden paths and automation.
I’ll report back in 6 months on how it goes. But honestly, I wish we’d done this 18 months ago.
Question for the community: Has anyone successfully calculated the productivity loss from delayed platform features? How do you quantify “what we didn’t build”?