We Self-Hosted Backstage for 18 Months. Here's What the ROI Calculation Missed

eng_director_luis · March 18, 2026, 12:04pm

I need to share this while it’s still fresh. We just got leadership approval to migrate from self-hosted Backstage to a managed platform (Roadie). The decision took 18 months longer than it should have.

Here’s what our original ROI calculation completely missed.

The Setup

Team: 4-person platform team
Context: 150 engineers, growing fast
Timeline: Started self-hosted Backstage implementation Q2 2024
Initial ROI logic: Save $200K/year in licensing costs by building ourselves

The math looked great. Leadership approved. We went all-in.

The Reality: Maintenance Devoured Everything

We thought we’d spend maybe 30-40% of our time on maintenance, 60-70% on building new capabilities.

Reality over 18 months:

~200 hours/month on maintenance
~50 hours/month on new features

That’s an 80/20 split. Inverted from what we expected.

Maintenance Breakdown

Monthly Backstage core upgrades (40 hours):

Backstage releases monthly
Breaking changes are common
Migration guides exist but aren’t always complete
Test everything after upgrade (catalog, plugins, auth, RBAC)

Plugin updates and compatibility (30 hours):

Core upgrade often breaks plugins
Some plugins lag behind core releases
Fork and maintain plugins ourselves (catalog pagination, custom integrations)
Verify compatibility across our 15+ plugins after each core update

Troubleshooting and developer support (60 hours):

“Portal is slow” (again)
“Can’t find my service in catalog” (sync issues)
“Deployment button doesn’t work” (permission config)
“Documentation is stale” (always)

Infrastructure maintenance (30 hours):

Database migrations after Backstage upgrades
Scaling workers for catalog ingestion
Monitoring and alerting refinements
Cost optimization for our AWS bill

New feature requests from developers (40 hours):

“Can we add Datadog integration?”
“Can we show cost per service?”
“Can we customize the homepage layout?”
These sound simple, but each requires plugin dev, testing, deployment

Total: ~200 hours/month = 50 hours/week = 1.25 FTEs

And we’re a 4-person team. That means ~30% of our capacity is just keeping the lights on.

But wait, it gets worse.

The Invisible Cost: Opportunity Cost

While we maintained Backstage internals, we didn’t build the capabilities that would actually improve developer productivity:

What We Delayed (By 6-12 Months Each)

Service onboarding automation: Developers still manually creating repos, setting up CI/CD, configuring monitoring. We delayed golden path templates by 9 months because we were stuck debugging React component rendering issues.
Cost visibility dashboard: Finance kept asking “which services cost the most to run?” We had the data, couldn’t build the UI because frontend capacity was consumed by Backstage maintenance.
Deployment automation: Developers still manually triggering deployments, no self-service. We delayed building this by 7 months because we had to rebuild catalog pagination from scratch (the default plugin didn’t scale).
Security compliance automation: Still manual checks for container scanning, secrets detection, policy enforcement. Delayed 11 months while we dealt with RBAC implementation bugs.

The Productivity Loss Calculation

Before platform improvements, developers waste ~5-6 hours/week on:

Finding documentation (scattered, stale)
Waiting for approvals (no automation)
Manual deployments (no self-service)
Debugging environment issues (inconsistent setups)

If our platform features had shipped on time, we’d reduce that to ~1-2 hours/week.

Lost productivity: 4 hours/week × 150 engineers × $75/hour × 48 weeks × (delays of 6-12 months average 9 months) = $780,000 in lost productivity while we delayed features.

That’s 4x our annual “savings” from not paying for a managed platform.

The TypeScript Skills Gap Was Brutal

Our platform team background:

Python, Go, Terraform, Kubernetes
React, TypeScript, frontend build tooling

Backstage is heavily frontend. We struggled:

Hiring: Tried to hire a frontend-focused platform engineer. Took 8 months to fill. That’s almost a year of frontend work backlog.

Contracting: Brought in a React contractor at $180/hour. Good work, but coordination overhead and knowledge transfer ate time. Added ~$90K annually (not in original budget).

Learning curve: Tried to upskill our existing team. Noble effort, but infrastructure engineers learning React while shipping production features? 3-6 months of reduced productivity.

This skill gap wasn’t in our TCO model. Add another $200K+ in hidden costs.

What We Got Right (Give Us Some Credit)

Not everything was a disaster:

Documentation culture: We treated internal platform docs as product docs. This actually worked well.

Platform as product mindset: Surveyed developers quarterly, tracked adoption metrics, prioritized based on impact.

Strong observability: Instrumented everything, could debug issues quickly when they came up.

Security posture: Full control over auth, data residency, compliance. This was actually valuable for our use case.

What We’d Do Differently: Hybrid Model

If I could go back, here’s what I’d pitch:

Buy foundation: Roadie, Humanitec, or Port for portal, catalog, auth, RBAC
Build differentiation: Golden paths, custom integrations, org-specific automation

We’d spend engineering time on:

Service templates that encode our architecture patterns
Deployment automation that integrates with our tools
Cost dashboards with our specific business logic
Security compliance automation with our requirements

Instead of:

Debugging React hooks
Implementing RBAC from scratch
Maintaining database migrations
Keeping plugin compatibility matrix updated

Platform engineers should solve organizational problems, not framework problems.

The Real ROI Calculation

Original calculation:

DIY: $0 platform cost + 4 engineers = $600K/year
Buy: $200K platform cost + 2 engineers = $500K/year
Conclusion: DIY is more expensive but gives control

Actual calculation:

DIY: $600K/year + $90K contractors + $780K lost productivity = $1.47M/year
Buy: $200K platform + 3 engineers building capabilities = $650K/year + $0 lost productivity
Net difference: $820K/year in favor of buy

We spent 18 months learning we should have spent $200K.

The Migration Decision

Last month, we finally got approval to migrate to Roadie. Leadership question: “Why didn’t you tell us sooner?”

Honest answer: Sunk cost fallacy. We’d invested so much, felt like we could make it work. “Just one more quarter and we’ll get ahead of maintenance.”

We never did.

For Those Evaluating This Decision

Ask yourself:

Maintenance burden: Are you accounting for actual monthly Backstage upgrade cycles, plugin compatibility, and developer support? Or using a guess?
Opportunity cost: What platform capabilities will you not build while maintaining framework code?
Skills match: Does your platform team have strong React/TypeScript skills? If not, what’s your plan?
Decision checkpoints: If maintenance exceeds X% of team time after 6 months, will you reevaluate? Or will sunk cost fallacy keep you committed?

The build vs buy decision isn’t just about upfront costs. It’s about where your engineering capacity goes and what you don’t build as a result.

We learned this the expensive way. You don’t have to.

Update: Migration to Roadie is underway. Timeline: 6-8 weeks. Cost: ~$100K one-time migration + $120K/year platform cost. We’re freeing up 3 FTEs to focus on golden paths and automation.

I’ll report back in 6 months on how it goes. But honestly, I wish we’d done this 18 months ago.

Question for the community: Has anyone successfully calculated the productivity loss from delayed platform features? How do you quantify “what we didn’t build”?

cto_michelle · March 18, 2026, 12:05pm

This resonates deeply. We had the exact same experience at my previous company, and your numbers are spot on.

We Measured The “Platform Lag” Impact

Here’s how we quantified the productivity loss you’re describing:

Before good IDP: Surveyed 50 developers randomly selected. Asked them to track time spent on “undifferentiated tasks” for 2 weeks:

Finding documentation: ~1.5 hours/week
Waiting for approvals/access: ~2 hours/week
Manual deployment tasks: ~1 hour/week
Debugging environment inconsistencies: ~1 hour/week
Total: 5-6 hours/week per developer

After managed IDP (6 months post-launch): Same survey methodology:

Finding documentation: ~15 minutes/week (catalog is source of truth)
Waiting for approvals: ~30 minutes/week (most automated via golden paths)
Manual deployment: ~0 minutes/week (self-service)
Environment issues: ~15 minutes/week (standardized templates)
Total: ~1 hour/week per developer

Impact: 5 hours/week saved × 150 developers × /hour × 48 weeks = .7M annual productivity gain

That’s the number that convinced our CFO. Your K calculation is actually conservative because you’re only counting the delayed features, not the ongoing inefficiency.

The Key Insight Most CFOs Miss

Your opportunity cost calculation—what we didn’t build because of maintenance burden—is the argument that finally broke through to leadership.

We framed it this way:

Platform teams should measure:

Time spent on maintenance vs time building capabilities
Features shipped vs features deferred
Developer productivity gained vs productivity lost waiting

When we showed leadership we were spending 70% of platform team time on maintenance, it was a wake-up call. That’s not sustainable, and it’s not strategic.

Backstage Is A Framework, Not A Product

This is the critical distinction that engineering teams miss:

Framework = Starting point for building your own solution
Product = Ready-to-use solution that solves a problem

Backstage is powerful as a framework. But frameworks require ongoing engineering investment to become products. Most orgs underestimate that investment by 3-5x.

Managed platforms (Roadie, Humanitec, Port) have taken the framework and built the product. They’ve already solved:

Authentication and authorization
UI components and design system
Catalog pagination and search
Plugin compatibility and updates
Scalability and performance
Monitoring and observability

When you DIY Backstage, you’re re-solving all those problems. Why?

The Question That Changed Our Discussion

During our internal debate, I asked the engineering team:

“Would you build your own Kubernetes? Your own Postgres? Your own monitoring system?”

Answer: “No, those are commodity infrastructure. We use managed services.”

“Then why are we building our own developer portal?”

Silence.

Platform portals are commodity infrastructure now. 75% of organizations will have one by end of 2026 (Gartner). The differentiation isn’t in having a portal, it’s in what workflows and golden paths you encode in it.

For Your Leadership Presentation

When you present the migration case, frame it around strategic engineering capacity allocation:

Current state (DIY):

4 engineers × 40% capacity on framework maintenance = 1.6 FTEs maintaining code
4 engineers × 60% capacity on capabilities = 2.4 FTEs building value
Value delivery rate: 2.4 FTEs

Future state (Managed + Build Differentiation):

K platform cost replaces framework maintenance
3 engineers × 90% capacity on golden paths and automation = 2.7 FTEs building value
Value delivery rate: 2.7 FTEs + faster time to value + reduced coordination overhead

The question isn’t “DIY vs buy.” It’s “Where do we focus our scarcest resource (engineering time)?”

One More Thing

Your sunk cost fallacy comment hit home. We did the same thing—“just one more quarter.”

The best time to make this decision was 18 months ago. The second-best time is now.

Leadership will respect you more for recognizing and correcting a strategic mistake than for stubbornly defending a failing approach.

Question: Did leadership support the pivot? Or was there pushback about “wasted investment”?

product_david · March 18, 2026, 12:05pm

This is exactly the kind of data I need to present to our CFO. Your breakdown of maintenance hours is incredibly valuable.

The Business Case Framing

Your opportunity cost calculation—K in lost productivity—is the story that will resonate with finance and exec teams.

Here’s why: Engineering time is an invisible cost. Leadership sees headcount and salary, but they don’t see where that capacity goes.

Your post makes it visible:

200 hours/month on maintenance = work that doesn’t create business value
9-month delay on service onboarding = slower team scaling = delayed hiring = slower revenue growth
TypeScript skill gap + contractor costs = hidden budget overrun

How To Quantify “What We Didn’t Build”

You asked how to calculate productivity loss from delayed features. Here’s the product framework I’d use:

Step 1: Identify Delayed Features

You listed:

Service onboarding automation (9 months delayed)
Cost visibility dashboard (7+ months delayed)
Deployment automation (7 months delayed)
Security compliance automation (11 months delayed)

Step 2: Estimate Impact Per Feature

Service onboarding automation:

Current state: 8-12 hours per service (manual setup, docs, CI/CD config)
With automation: 30 minutes per service
Services created in 9 months: ~30 (rough estimate for 150-engineer org)
Time saved: 30 services × 10 hours = 300 hours = ,500
But also: Faster team scaling, faster time-to-production for new features
Business impact: Delayed 3-4 new team launches by 9 months = delayed revenue/features

Cost visibility dashboard:

Without it: Finance asks questions, engineering does manual analysis
Estimated manual effort: 10 hours/month across engineering leadership
9 months delay = 90 hours = ,750
But also: Poor cost optimization, over-spending on unused services
Business impact: Estimated -20K/month in optimization opportunities missed

Deployment automation:

Manual deployments: 20-30 minutes per deploy, coordination overhead, errors
Engineers doing manual deploys: 150 engineers × 2 deploys/week = 300 deploys/week
Time waste: 300 × 0.4 hours = 120 hours/week = ,000/week = K/month
7 months delay = ,000 in wasted engineering time

Security compliance automation:

Manual security checks before production releases
Estimated: 2-4 hours per release cycle, affects team velocity
Slows down release cadence, creates bottlenecks

Step 3: Total Opportunity Cost

Deployment automation alone = K over 7 months
Service onboarding inefficiency = K + delayed team scaling
Cost visibility = K + K in missed optimization (9 months × K)

Conservative total: K in quantifiable losses

Add your K calculation for broader productivity impact, and you’re looking at .2M in total opportunity cost.

That’s 6x your annual platform subscription cost.

The Finance Conversation

When you present this to your CFO, frame it around:

1. ROI on platform investment:

Current DIY cost: K/year (team) + K (lost productivity) = .38M/year
Managed platform cost: K/year (platform) + K (team building golden paths) = K/year
Net savings: K/year = 112% ROI in year one

2. Strategic capacity allocation:

DIY: 4 engineers, 60% on maintaining framework, 40% on capabilities
Managed: 3 engineers, 90% on capabilities
2.7 FTEs vs 1.6 FTEs delivering value = 70% more value delivery

3. Time to value:

DIY: 18 months to production, delayed features 6-12 months each
Managed: 6-8 weeks to production, features ship immediately after
14-16 months faster = .1M - .5M in early productivity gains

The Question I’d Ask Your Leadership

“We’ve invested M and 18 months. We can continue down this path and invest another .1M/year, or we can invest K to migrate and K/year going forward. Which maximizes our return on the M already spent?”

Sunk cost fallacy is real, but good capital allocation is about future returns, not past investments.

How Did You Sell The Pivot?

You mentioned leadership asked “why didn’t you tell us sooner?” How did you frame the migration decision without it feeling like admitting failure?

I’m preparing for this exact conversation and want to get the messaging right. The goal is:

Acknowledge the learning
Frame it as strategic correction, not failure
Focus on future value, not past cost
Get buy-in without blame

Any advice on navigating that conversation?

vp_eng_keisha · March 18, 2026, 12:06pm

The people side of this story is critical, and I don’t think it’s getting enough attention.

Platform Team Morale and Burnout

Your maintenance burden breakdown—200 hours/month just keeping lights on—isn’t just a capacity problem. It’s a morale problem.

Platform engineering should be energizing work:

Solving novel organizational problems
Enabling hundreds of developers
Seeing direct impact on productivity
Building capabilities that didn’t exist

Instead, your team is:

Debugging React hooks
Chasing plugin compatibility issues
Manually upgrading dependencies monthly
Fighting the same fires over and over

That’s a recipe for burnout.

The Warning Signs

I’ve seen this pattern before:

Early stage (Months 1-6): “We’re building something cool! This is exciting!”

Middle stage (Months 7-12): “Why are we spending so much time on maintenance? When do we build features?”

Late stage (Months 13-18): “I didn’t sign up for this. I’m looking for a new role.”

Your 18-month timeline maps perfectly to this burnout curve.

Question for you: How did your platform engineers feel about the maintenance burden? Did you see turnover or signs of disengagement?

The Retention Risk Math

Losing one senior engineer = K+ replacement cost (recruiting, onboarding, lost productivity, knowledge loss)

If maintenance treadmill causes one engineer to leave, that’s the cost of your annual managed platform subscription.

If you lose two engineers because of burnout from framework maintenance, you’ve now spent more on turnover than you “saved” by not buying a managed platform.

Retention is a financial metric, not just an HR metric.

The Organizational Health Indicators

Healthy platform teams:

70% building new capabilities, 30% maintenance
High morale, engineers excited about impact
Strong knowledge distribution (multiple people can handle any area)
Clear roadmap of capabilities to ship

Unhealthy platform teams:

70% maintenance, 30% building (yours)
Morale declining, engineers feeling stuck
Knowledge concentration (“only Sarah understands this”)
Roadmap constantly slipping

Your team was in the unhealthy category. The question is: How long before people started leaving?

The Downstream Impact

Platform team frustration spreads:

App teams get frustrated:

“Why is the portal so slow?”
“Why can’t they add [basic feature]?”
“Why does every core update break things?”

Leadership gets frustrated:

“Why are platform capabilities delayed?”
“Why do we need 4 people for this?”
“Why aren’t developers adopting the portal?”

Platform team feels squeezed from all sides, blamed for problems that are structural (choosing to maintain framework instead of building capabilities).

Management Lesson

As a VP, here’s the framework I use:

Track platform team ticket composition:

Reactive work (bugs, support, upgrades, firefighting)
Proactive work (new capabilities, improvements, automation)

Healthy ratio: 30% reactive, 70% proactive.

If reactive work exceeds 40%, that’s a yellow flag.
If reactive work exceeds 50%, that’s a red flag—intervene immediately.

Your 80% maintenance burden is off the charts unhealthy.

The Talent Opportunity Cost

While your 4 engineers maintained Backstage framework:

They weren’t solving organizational problems
They weren’t building golden paths
They weren’t improving developer experience
They weren’t growing their skills in platform engineering

They were growing skills in Backstage framework internals.

Which skillset is more valuable for their careers? Which makes them more marketable? Which do they actually want to develop?

I’d bet your engineers would rather become experts in:

Developer experience optimization
Platform capabilities and golden paths
Workflow automation and integration
Organizational patterns and practices

Than experts in:

Backstage plugin API changes
React component debugging
TypeScript build configuration
Database migration scripts

The Migration As Team Investment

Reframe the migration to Roadie as investing in your team:

Before: 4 engineers, 80% maintaining framework (3.2 FTEs), 20% building capabilities (0.8 FTEs)

After: 3 engineers, 10% maintaining managed platform integrations (0.3 FTEs), 90% building capabilities (2.7 FTEs)

Value delivery increases from 0.8 FTEs to 2.7 FTEs = 3.4x more value

But also:

Morale improves (engineers doing meaningful work)
Skills develop (transferable platform engineering vs niche framework expertise)
Retention improves (engineers see impact and growth)
Recruiting improves (“build platform capabilities” vs “maintain Backstage fork”)

The K platform cost is also an investment in your team’s engagement and growth.

The Conversation With Your Team

When you announced the migration to Roadie, how did your platform engineers react?

My guess: Relief.

“Finally, we can focus on building things that matter.”
“I’m excited to work on golden paths instead of debugging React.”
“This is what I actually signed up for.”

If that’s the reaction, it validates that the team knows the DIY approach wasn’t working.

For Leaders Evaluating This

When your platform team says they want to build:

Ask: “What % of your time will be maintenance vs building capabilities?”
Ask: “What skills will you develop? Are those skills you want?”
Ask: “How will you measure success? Time to ship features or time to maintain framework?”

If the answers don’t align with team growth and organizational value, reconsider.

Your people are your scarcest resource. Don’t burn them out on undifferentiated maintenance.

maya_builds · March 18, 2026, 12:07pm

Platform-as-product perspective: Your users (developers) were suffering too.

The 9-month delay on service onboarding automation isn’t just an internal platform team problem. It’s a poor product experience for every developer trying to spin up a new service.

User Impact Of Delayed Features

Let me translate your platform delays into user pain:

Service onboarding automation (9-month delay):

Every new service = 8-12 hours of manual toil
Inconsistent setups = bugs, downtime, security gaps
New engineers can’t self-serve = bottleneck on senior engineers
Slower feature delivery = frustrated product teams

Deployment automation (7-month delay):

Manual deployments = slow, error-prone, stressful
Can’t deploy on Friday = delayed releases, weekend work
Coordination overhead = “who’s deploying? can I deploy now?”
Fear of deployment = release anxiety, batching changes

Cost visibility dashboard (delayed):

Engineering teams overspend without knowing
Finance asks for cost breakdowns = engineering does manual analysis
No visibility = no optimization = wasted money

Security compliance (delayed):

Manual security checks = slow release cycles
Inconsistent checks = security gaps
Bottleneck on security team = frustrated developers

If This Were An External Product

Imagine you’re building a SaaS product for developers. Your roadmap says:

Q2: Ship core product
Q3: Ship automation features
Q4: Ship cost dashboard
Q1 next year: Ship deployment self-service

But instead:

Q2: Ship minimal product
Q3: Fix bugs, maintain infrastructure, no new features
Q4: Still fixing bugs, upgrade core framework, no new features
Q1: Still no features shipped. Promise “next quarter.”

How many customers would you have left?

If this were an external product, a 9-12 month delay in core features would be an existential threat. You’d be out of business.

Why do we tolerate it for internal tools?

The Adoption Problem

You mentioned developers were frustrated with portal slowness and issues. I’d bet:

Portal adoption was low (or declining)
Developer satisfaction was low
Platform team was blamed (even though problem was structural)

When the product experience is poor, users don’t use it. When users don’t use it, the platform investment—.2M and 18 months—delivers zero ROI.

A managed platform with 90% adoption delivers infinitely more value than a DIY platform with 40% adoption.

Product Thinking Question

Did you ever survey your developers about the portal?

Questions I’d ask:

How often do you use the developer portal? (daily, weekly, rarely, never)
What’s your satisfaction with the portal? (1-10 scale)
What features do you wish existed? (open response)
What frustrates you about the portal? (open response)

My guess at your results:

Usage: Low or declining
Satisfaction: 4-6 out of 10
Desired features: The ones you delayed (automation, cost visibility, self-service)
Frustrations: Slow, buggy, missing features

That’s product-market fit failure.

The Product Lesson

Internal tools are products. They need:

Product management (prioritization, roadmap, user research)
Design quality (good UX, consistent experience)
Reliability (uptime, performance, polish)
Continuous iteration (listening to users, shipping improvements)

When you’re maintaining a framework, you’re not doing product management. You’re doing framework engineering.

Those are different jobs with different outcomes.

The Managed Platform Advantage

Roadie’s product team:

Ships features weekly
Runs user research constantly
Maintains design quality
Handles scale and performance
Tests across hundreds of customers

Your 4-person platform team can’t compete with that. Not because they’re not talented—because maintaining a framework leaves no capacity for product work.

For Your Migration

When you migrate to Roadie and start building golden paths, you’ll finally be doing product work:

Understanding developer workflows
Identifying pain points and bottlenecks
Designing and building solutions
Measuring adoption and impact
Iterating based on feedback

That’s the work that creates value. That’s the work your engineers probably want to do.

Appreciate your honesty about the expensive learning. This thread should be required reading for anyone evaluating platform build vs buy.

Question: Did you track developer satisfaction with the portal during your DIY phase? Did it improve or decline over the 18 months?