I need to write this while it’s still painful. We just completed a migration from self-hosted Backstage to Roadie. It took 18 months longer and cost $1.5M more than it should have.
Here’s the story of expensive mistakes and hard-earned lessons.
The Context
Previous company (I’ve since moved on):
- 200 engineers, Series B SaaS
- Strong “we build everything” engineering culture
- Chose to self-host Backstage in 2024
My role: VP of Engineering, responsible for platform strategy
The decision: Build our own developer portal despite managed options (Roadie, Humanitec) being available and proven
The Setup: Why We Chose Build
Our reasoning sounded solid:
Complete technical control: “We need flexibility for our unique requirements”
No vendor lock-in: “What if vendor raises prices or shuts down?”
Cultural fit: “We’re builders. This aligns with our identity.”
Cost savings: “Why pay $200K/year when we can build it ourselves?”
Strategic ownership: “Platform is core infrastructure, we should own it”
Every single one of these was either wrong or overweighted.
Mistake #1: Underestimated Maintenance Burden
What we thought: 1-2 engineers maintaining, 4-5 building features
Reality: 3 engineers maintaining, 1 engineer building features (when lucky)
The Maintenance Reality
- Monthly Backstage upgrades (breaking changes common)
- Plugin compatibility issues after every upgrade
- Developer support and troubleshooting
- Infrastructure scaling and ops
- Security patches and dependency updates
- Documentation maintenance
Estimated maintenance: 30% of team time
Actual maintenance: 75% of team time
We spent $900K on engineering capacity, but $675K went to maintenance.
Mistake #2: Overlooked TypeScript Skill Gap
Our platform team background:
- Strong in: Python, Go, Terraform, Kubernetes
- Weak in: React, TypeScript, frontend architecture
Backstage requires: Heavy frontend work
What This Meant
- 8 months to hire a frontend-focused engineer (niche skillset)
- $180/hour contractor for React work ($90K/year, not budgeted)
- 3-6 month learning curve for existing team trying to upskill
- Slow feature delivery while waiting for frontend capacity
We optimized for infrastructure skills when we needed product engineering skills.
Mistake #3: Treated It As Technical Decision, Ignored Product Aspects
What we optimized for:
- Technical architecture
- Deployment model (self-hosted vs SaaS)
- Framework control
What we ignored:
- Developer experience quality
- Adoption and satisfaction
- Time to value for users
- Product-market fit for internal tool
The Result
Our portal worked functionally, but:
- Felt homemade compared to tools devs used daily (GitHub, Linear, Vercel)
- Had UX inconsistencies and missing polish
- Broke occasionally during upgrades
- Lacked features developers actually wanted (we were too busy maintaining)
Adoption after 18 months: ~40%
Developers avoided it when possible. Created workarounds. Complained in surveys.
We built infrastructure when we needed to build a product.
Mistake #4: Sunk Cost Fallacy Kept Us Committed Too Long
After 12 months, signs were clear:
- Maintenance burden was unsustainable
- Developer satisfaction was low
- Feature velocity was too slow
- Team morale was declining
But we’d invested $900K and 12 months. Leadership felt we were “too far in to turn back.”
The Conversation I Should Have Had (But Didn’t)
"We’ve invested $900K learning that self-hosted Backstage doesn’t work for our organization. We can invest another $1.1M continuing down this path, or we can invest $100K migrating to Roadie and $200K/year going forward.
The $900K is sunk. The question is: how do we maximize return on future investment?"
I didn’t have this conversation for another 6 months. That delay cost us $550K in continued DIY costs + $390K in lost productivity = $940K of avoidable expense.
Sunk cost fallacy cost us nearly $1M.
Mistake #5: Didn’t Pilot or Test Managed Alternatives First
We debated build vs buy for months. All theoretical.
What we should have done:
- 30-day Roadie trial
- Connect to our GitHub, CI/CD, monitoring
- Import service catalog
- Get 20 developers to use it for 2 weeks
- Measure what works, what doesn’t, what’s missing
Cost: $0 (free trial period)
Time: 2 weeks
Value: Evidence-based decision instead of theoretical debate
We didn’t do this because "we already decided to build."
That decision to skip validation cost us $1.5M.
The Pivot: Finally Switching to Managed Platform
After 18 months of self-hosting:
- Total spent: ~$1.6M (team costs + infrastructure + contractors)
- Developer adoption: 40%
- Platform team morale: 4/10 (stuck in maintenance, burned out)
- Features delivered: 3 major capabilities (far below roadmap)
Leadership finally approved migration to Roadie.
The Migration
- Timeline: 2 months (catalog export, config migration, user training)
- Cost: ~$100K (engineering time + migration support)
- Result: Freed 3 engineers to focus on golden paths and automation
Current State (9 months post-migration)
- Platform cost: $120K/year (Roadie) + $450K/year (3 engineers) = $570K/year
- Developer adoption: 90% (huge jump from 40%)
- Platform team morale: 9/10 (building meaningful features, seeing impact)
- Features delivered: 8 new capabilities in 9 months (vs 3 in previous 18 months)
We should have done this from day one.
The Real Financial Impact
What we spent learning DIY doesn’t work:
- 18 months self-hosted: $1.6M
- Migration cost: $100K
- Total learning cost: $1.7M
What we should have spent:
- 18 months managed platform: $180K (platform) + $675K (team) = $855K
- Theoretical savings: $845K
But the real cost is opportunity cost:
Lost Productivity During DIY Phase
200 developers × 4 hours/week wasted × $75/hour × 48 weeks × 1.5 years = $4.32M in lost productivity
Had we started with Roadie, platform would have been production-ready in 2 months instead of 18 months. That’s 16 months of productivity gains we missed = $3.84M in lost productivity.
Total mistake cost: $845K direct + $3.84M opportunity cost = $4.69M
We spent $4.69M learning what a $0 pilot program would have taught us.
Leadership Lessons I Learned The Hard Way
1. Sometimes “Build” Is Ego, Not Strategy
"We’re builders" sounds like culture, but it’s often ego.
Test: If this were an external product, would customers buy it at current quality?
If no, then internal users shouldn’t have to tolerate it either.
2. Set Decision Checkpoints Early
"After 6 months, if maintenance >30% of platform team time, we reevaluate."
This creates psychological safety to acknowledge when something isn’t working.
We didn’t set checkpoints. Felt like failure to reconsider. Let sunk cost fallacy drive decisions.
3. Pilot Before Commitment
Every major technical decision should include:
- 30-60 day proof of concept
- Real usage by real users
- Measured outcomes
- Evidence-based decision
"We’ll figure it out" is not a plan. "We tested for 30 days and here’s what we learned" is a plan.
4. Platform Is Product, Not Infrastructure
Treating platform as infrastructure project → focus on technical architecture, deployment model
Treating platform as product project → focus on user experience, adoption, value delivery
We optimized for the wrong thing.
5. Your People Are Your Scarcest Resource
$1.5M in wasted spend is painful.
More painful: 18 months of platform team burning out on maintenance instead of building capabilities.
2 engineers left during this period (burnout + frustration). Replacement cost + lost knowledge = $400K+
Don’t waste your best people on undifferentiated maintenance work.
Advice For Teams Evaluating This Decision
The Framework I Wish I’d Used
Ask these questions (and demand honest answers):
-
Differentiation test: “Is our developer portal a competitive advantage or commodity infrastructure?”
- If commodity → Strong bias to buy
-
Skill match: “Do we have the skills needed to build this well?”
- Backstage needs frontend chops, not just infrastructure
- If skill gap → Build will be slow and painful
-
Maintenance reality: “What % of platform team time will go to maintenance?”
- Industry reality: 50-70% for DIY
- If >30% maintenance → You’re not building capabilities, you’re maintaining framework
-
Adoption risk: “Will developers choose to use what we build?”
- Be honest about product design and UX capabilities
- If adoption risk is high → Buy professional product
-
Time to value: “How fast do we need developer productivity improvements?”
- DIY: 18+ months to production-ready
- Buy: 2 months to production-ready
- If speed matters → Buy
-
Team impact: “Will platform engineers be energized or burned out by this work?”
- Maintaining frameworks → Burnout
- Building capabilities → Engagement
- If team health matters → Buy foundation, build differentiation
The Decision Heuristic
If you answer “yes” to 4+ of these, strongly consider managed platform:
- Platform is commodity infrastructure, not differentiator
- We lack strong frontend/TypeScript skills
- We need results in <12 months
- We have <5 engineers to dedicate to platform
- We care about professional UX quality
- We want platform engineers building capabilities, not maintaining framework
For us, 6/6 were true. We should have bought.
Start With Managed, Build Custom If Needed
Default approach: Start with managed platform (Roadie, Humanitec, Port)
After 6-12 months of usage, evaluate:
- Are we hitting limitations?
- Do we have genuinely unique requirements?
- Do we have the team and skills to self-host?
If yes, then consider migration. But you’ll have evidence, not assumptions.
Most teams will find: Managed platform + custom plugins solves 95% of needs.
What I’d Do Differently
Everything.
Start with Roadie on day one. Get to production in 2 months. Build golden paths and automation while DIY teams are still debugging React hooks.
Spend 18 months building 15+ platform capabilities instead of 3.
Save $4.69M in direct costs + opportunity costs.
Keep 2 engineers who left due to burnout.
Deliver 90% adoption instead of 40%.
But we can’t undo the past. We can only share the lessons.
Question For The Community
For those evaluating this decision: What decision-making frameworks help you avoid expensive learning like we went through?
For those who’ve made similar mistakes: What helped you recognize the mistake early and pivot?
For those who’ve succeeded with hybrid approaches: How did you decide where to draw the line between buy and build?
I’m sharing this vulnerable post-mortem because:
- Too many teams are making the same mistakes we did
- The “build vs buy” discourse is too theoretical
- Real numbers and real consequences should inform decisions
We spent $1.5M and 18 months learning what a 30-day pilot would have taught us.
Learn from our expensive mistakes. You don’t have to make them yourself.