Platform Engineering Build vs Buy: What I Wish I Knew Before We Spent $1.5M Learning

I need to write this while it’s still painful. We just completed a migration from self-hosted Backstage to Roadie. It took 18 months longer and cost $1.5M more than it should have.

Here’s the story of expensive mistakes and hard-earned lessons.

The Context

Previous company (I’ve since moved on):

  • 200 engineers, Series B SaaS
  • Strong “we build everything” engineering culture
  • Chose to self-host Backstage in 2024

My role: VP of Engineering, responsible for platform strategy

The decision: Build our own developer portal despite managed options (Roadie, Humanitec) being available and proven

The Setup: Why We Chose Build

Our reasoning sounded solid:

:white_check_mark: Complete technical control: “We need flexibility for our unique requirements”
:white_check_mark: No vendor lock-in: “What if vendor raises prices or shuts down?”
:white_check_mark: Cultural fit: “We’re builders. This aligns with our identity.”
:white_check_mark: Cost savings: “Why pay $200K/year when we can build it ourselves?”
:white_check_mark: Strategic ownership: “Platform is core infrastructure, we should own it”

Every single one of these was either wrong or overweighted.

Mistake #1: Underestimated Maintenance Burden

What we thought: 1-2 engineers maintaining, 4-5 building features
Reality: 3 engineers maintaining, 1 engineer building features (when lucky)

The Maintenance Reality

  • Monthly Backstage upgrades (breaking changes common)
  • Plugin compatibility issues after every upgrade
  • Developer support and troubleshooting
  • Infrastructure scaling and ops
  • Security patches and dependency updates
  • Documentation maintenance

Estimated maintenance: 30% of team time
Actual maintenance: 75% of team time

We spent $900K on engineering capacity, but $675K went to maintenance.

Mistake #2: Overlooked TypeScript Skill Gap

Our platform team background:

  • Strong in: Python, Go, Terraform, Kubernetes
  • Weak in: React, TypeScript, frontend architecture

Backstage requires: Heavy frontend work

What This Meant

  • 8 months to hire a frontend-focused engineer (niche skillset)
  • $180/hour contractor for React work ($90K/year, not budgeted)
  • 3-6 month learning curve for existing team trying to upskill
  • Slow feature delivery while waiting for frontend capacity

We optimized for infrastructure skills when we needed product engineering skills.

Mistake #3: Treated It As Technical Decision, Ignored Product Aspects

What we optimized for:

  • Technical architecture
  • Deployment model (self-hosted vs SaaS)
  • Framework control

What we ignored:

  • Developer experience quality
  • Adoption and satisfaction
  • Time to value for users
  • Product-market fit for internal tool

The Result

Our portal worked functionally, but:

  • Felt homemade compared to tools devs used daily (GitHub, Linear, Vercel)
  • Had UX inconsistencies and missing polish
  • Broke occasionally during upgrades
  • Lacked features developers actually wanted (we were too busy maintaining)

Adoption after 18 months: ~40%

Developers avoided it when possible. Created workarounds. Complained in surveys.

We built infrastructure when we needed to build a product.

Mistake #4: Sunk Cost Fallacy Kept Us Committed Too Long

After 12 months, signs were clear:

  • Maintenance burden was unsustainable
  • Developer satisfaction was low
  • Feature velocity was too slow
  • Team morale was declining

But we’d invested $900K and 12 months. Leadership felt we were “too far in to turn back.”

The Conversation I Should Have Had (But Didn’t)

"We’ve invested $900K learning that self-hosted Backstage doesn’t work for our organization. We can invest another $1.1M continuing down this path, or we can invest $100K migrating to Roadie and $200K/year going forward.

The $900K is sunk. The question is: how do we maximize return on future investment?"

I didn’t have this conversation for another 6 months. That delay cost us $550K in continued DIY costs + $390K in lost productivity = $940K of avoidable expense.

Sunk cost fallacy cost us nearly $1M.

Mistake #5: Didn’t Pilot or Test Managed Alternatives First

We debated build vs buy for months. All theoretical.

What we should have done:

  • 30-day Roadie trial
  • Connect to our GitHub, CI/CD, monitoring
  • Import service catalog
  • Get 20 developers to use it for 2 weeks
  • Measure what works, what doesn’t, what’s missing

Cost: $0 (free trial period)
Time: 2 weeks
Value: Evidence-based decision instead of theoretical debate

We didn’t do this because "we already decided to build."

That decision to skip validation cost us $1.5M.

The Pivot: Finally Switching to Managed Platform

After 18 months of self-hosting:

  • Total spent: ~$1.6M (team costs + infrastructure + contractors)
  • Developer adoption: 40%
  • Platform team morale: 4/10 (stuck in maintenance, burned out)
  • Features delivered: 3 major capabilities (far below roadmap)

Leadership finally approved migration to Roadie.

The Migration

  • Timeline: 2 months (catalog export, config migration, user training)
  • Cost: ~$100K (engineering time + migration support)
  • Result: Freed 3 engineers to focus on golden paths and automation

Current State (9 months post-migration)

  • Platform cost: $120K/year (Roadie) + $450K/year (3 engineers) = $570K/year
  • Developer adoption: 90% (huge jump from 40%)
  • Platform team morale: 9/10 (building meaningful features, seeing impact)
  • Features delivered: 8 new capabilities in 9 months (vs 3 in previous 18 months)

We should have done this from day one.

The Real Financial Impact

What we spent learning DIY doesn’t work:

  • 18 months self-hosted: $1.6M
  • Migration cost: $100K
  • Total learning cost: $1.7M

What we should have spent:

  • 18 months managed platform: $180K (platform) + $675K (team) = $855K
  • Theoretical savings: $845K

But the real cost is opportunity cost:

Lost Productivity During DIY Phase

200 developers × 4 hours/week wasted × $75/hour × 48 weeks × 1.5 years = $4.32M in lost productivity

Had we started with Roadie, platform would have been production-ready in 2 months instead of 18 months. That’s 16 months of productivity gains we missed = $3.84M in lost productivity.

Total mistake cost: $845K direct + $3.84M opportunity cost = $4.69M

We spent $4.69M learning what a $0 pilot program would have taught us.

Leadership Lessons I Learned The Hard Way

1. Sometimes “Build” Is Ego, Not Strategy

"We’re builders" sounds like culture, but it’s often ego.

Test: If this were an external product, would customers buy it at current quality?

If no, then internal users shouldn’t have to tolerate it either.

2. Set Decision Checkpoints Early

"After 6 months, if maintenance >30% of platform team time, we reevaluate."

This creates psychological safety to acknowledge when something isn’t working.

We didn’t set checkpoints. Felt like failure to reconsider. Let sunk cost fallacy drive decisions.

3. Pilot Before Commitment

Every major technical decision should include:

  • 30-60 day proof of concept
  • Real usage by real users
  • Measured outcomes
  • Evidence-based decision

"We’ll figure it out" is not a plan. "We tested for 30 days and here’s what we learned" is a plan.

4. Platform Is Product, Not Infrastructure

Treating platform as infrastructure project → focus on technical architecture, deployment model

Treating platform as product project → focus on user experience, adoption, value delivery

We optimized for the wrong thing.

5. Your People Are Your Scarcest Resource

$1.5M in wasted spend is painful.

More painful: 18 months of platform team burning out on maintenance instead of building capabilities.

2 engineers left during this period (burnout + frustration). Replacement cost + lost knowledge = $400K+

Don’t waste your best people on undifferentiated maintenance work.

Advice For Teams Evaluating This Decision

The Framework I Wish I’d Used

Ask these questions (and demand honest answers):

  1. Differentiation test: “Is our developer portal a competitive advantage or commodity infrastructure?”

    • If commodity → Strong bias to buy
  2. Skill match: “Do we have the skills needed to build this well?”

    • Backstage needs frontend chops, not just infrastructure
    • If skill gap → Build will be slow and painful
  3. Maintenance reality: “What % of platform team time will go to maintenance?”

    • Industry reality: 50-70% for DIY
    • If >30% maintenance → You’re not building capabilities, you’re maintaining framework
  4. Adoption risk: “Will developers choose to use what we build?”

    • Be honest about product design and UX capabilities
    • If adoption risk is high → Buy professional product
  5. Time to value: “How fast do we need developer productivity improvements?”

    • DIY: 18+ months to production-ready
    • Buy: 2 months to production-ready
    • If speed matters → Buy
  6. Team impact: “Will platform engineers be energized or burned out by this work?”

    • Maintaining frameworks → Burnout
    • Building capabilities → Engagement
    • If team health matters → Buy foundation, build differentiation

The Decision Heuristic

If you answer “yes” to 4+ of these, strongly consider managed platform:

  • Platform is commodity infrastructure, not differentiator
  • We lack strong frontend/TypeScript skills
  • We need results in <12 months
  • We have <5 engineers to dedicate to platform
  • We care about professional UX quality
  • We want platform engineers building capabilities, not maintaining framework

For us, 6/6 were true. We should have bought.

Start With Managed, Build Custom If Needed

Default approach: Start with managed platform (Roadie, Humanitec, Port)

After 6-12 months of usage, evaluate:

  • Are we hitting limitations?
  • Do we have genuinely unique requirements?
  • Do we have the team and skills to self-host?

If yes, then consider migration. But you’ll have evidence, not assumptions.

Most teams will find: Managed platform + custom plugins solves 95% of needs.

What I’d Do Differently

Everything.

Start with Roadie on day one. Get to production in 2 months. Build golden paths and automation while DIY teams are still debugging React hooks.

Spend 18 months building 15+ platform capabilities instead of 3.

Save $4.69M in direct costs + opportunity costs.

Keep 2 engineers who left due to burnout.

Deliver 90% adoption instead of 40%.

But we can’t undo the past. We can only share the lessons.

Question For The Community

For those evaluating this decision: What decision-making frameworks help you avoid expensive learning like we went through?

For those who’ve made similar mistakes: What helped you recognize the mistake early and pivot?

For those who’ve succeeded with hybrid approaches: How did you decide where to draw the line between buy and build?


I’m sharing this vulnerable post-mortem because:

  1. Too many teams are making the same mistakes we did
  2. The “build vs buy” discourse is too theoretical
  3. Real numbers and real consequences should inform decisions

We spent $1.5M and 18 months learning what a 30-day pilot would have taught us.

Learn from our expensive mistakes. You don’t have to make them yourself.

Thank you for sharing this. Vulnerability at the leadership level about expensive mistakes is rare and valuable.

The Framework You Wish You’d Used (Now My Standard)

Your 6-question framework is exactly right. I’m adapting it into what I call the Platform Strategy Canvas for all major technical decisions.

Nine Dimensions To Evaluate

Beyond your 6 questions, I add 3 more:

7. Cost structure: 3-year TCO including opportunity cost (not just platform licensing)

8. Migration path: Can we change direction if we’re wrong? (Escape hatches matter)

9. Organizational alignment: Does this match our maturity and culture realistically? (Not aspirationally)

Score each dimension for Build vs Buy vs Hybrid.

Involve: Engineering Directors, CFO, VP Product, Head of Talent

This cross-functional analysis prevents the siloed technical decisions that led to your .69M mistake.

The Decision Checkpoints You Mentioned

This is critical. I now mandate these for every major technical investment:

Checkpoint at 3 months: “Are we on track? What have we learned? What would we do differently?”

Checkpoint at 6 months: “Is maintenance burden <30% of team time? Is adoption trending toward target? Are we hitting milestones?”

Decision trigger: “If maintenance >30% or adoption <50% or timeline slipping >3 months, we pause and reevaluate options.”

This creates psychological safety to acknowledge when something isn’t working. Leadership anti-pattern you fell into: “We’ve invested so much, can’t turn back now.”

Better pattern: “We’ve learned a lot, now we can make a better decision.”

The .5M Learning vs /bin/zsh Pilot

Your point about skipping the pilot is the painful part. A 30-day proof of concept costs essentially nothing but time.

Standard POC protocol I now require:

Week 1: Setup and integration (connect to GitHub, CI/CD, monitoring)
Week 2: Configuration (RBAC, service catalog import, basic workflows)
Week 3: User testing (20 developers use it, collect feedback)
Week 4: Analysis (measure what works, what doesn’t, what’s missing)

Deliverable: Evidence-based recommendation with actual usage data, not theoretical debate

Cost: ~40 hours of engineering time = ,000
Value: Avoiding .5M mistakes

ROI on POC: 50,000%

The Sunk Cost Fallacy Recognition

Your 6-month delay after red flags appeared cost K. This resonates deeply.

How to recognize sunk cost fallacy in real-time:

Warning signs:

  • “We’ve already invested so much” (focusing on past, not future)
  • “Just one more quarter” (indefinite timeline extension)
  • “We’re almost there” (despite evidence to contrary)
  • “Switching now would mean admitting failure” (ego driving decisions)

Counter-narrative:

  • “The already spent is sunk. What maximizes future ROI?”
  • “What would we recommend to a peer in this situation?”
  • “If we were starting today with what we know now, what would we choose?”

Reframe from “admitting failure” to “learning-informed strategy update”.

Show the data: Maintenance burden, opportunity cost, alternative costs, team morale, adoption metrics.

Emphasize: Engineers are excited to focus on meaningful platform work instead of framework maintenance.

The .69M Total Impact

Your breakdown (direct cost + opportunity cost) is the full picture that most analyses miss:

  • Direct overspend: K
  • Opportunity cost (lost productivity): .84M
  • Total: .69M

Plus: Unmeasured impacts:

  • 2 engineers who left (burnout) = K+ replacement cost
  • Developer frustration and workarounds = Hard to quantify but real
  • Leadership credibility damage = Political capital spent

Real total: ~M+ in impact from one decision.

The “Build Is Ego” Insight

This is the hardest one for engineering leaders to internalize. “We’re builders” is core identity.

Reframing that helps:

  • “We build what differentiates us” (not “we build everything”)
  • “We buy commodity infrastructure to focus on competitive advantage”
  • “Building the right things is harder than building all things”

The discipline test: Can you articulate specific business value created by owning the implementation?

“We need control” → Not a business case
“We need control to implement custom compliance workflows for FDA requirements” → Business case

If you can’t connect control to revenue, margin, or strategic capability, default to buy.

For Your Questions

Q: What frameworks help avoid expensive learning?

The 9-dimension Platform Strategy Canvas I outlined. Plus mandatory POCs for decisions >K.

Q: How to recognize mistakes early and pivot?

Decision checkpoints at 3, 6, 12 months. Preset metrics (maintenance burden, adoption, timeline). If metrics miss targets, pause and reevaluate.

Q: Hybrid approaches - where to draw the line?

Commodity = Buy. Differentiator = Build. Test: “Could another company use this exact capability?”

What I’d Add To Your Lessons

Lesson 6: Involve CFO early in technical strategy

Bring finance into roadmap reviews. Show how technical decisions impact business outcomes. Create shared language around ROI, productivity, and opportunity cost.

When CFO understands “platform enables faster hiring, faster feature delivery, better compliance,” they become allies in right-sizing investment.

Lesson 7: Platform team composition signals strategy

  • 80% infrastructure engineers + 20% product → DIY path
  • 50-50 mix → Hybrid path
  • 80% product engineers + 20% infrastructure → Buy path

Look at skills on the team. That reveals the real strategy.

Final Thought

Your post should be required reading for every VP of Engineering evaluating platform strategy.

The .69M lesson you learned = priceless education for the community.

Thank you for the honesty and vulnerability.

Your TypeScript skill gap story is painfully familiar. We had the exact same issue—platform team strong in infrastructure, weak in frontend.

The Skills Mismatch Nobody Talks About

Most platform teams come from:

  • Infrastructure backgrounds (Kubernetes, Terraform, cloud)
  • Backend engineering (Go, Python, distributed systems)
  • SRE/DevOps (monitoring, automation, reliability)

Backstage requires:

  • Frontend engineering (React, TypeScript, component design)
  • Web application architecture (state management, routing, build tooling)
  • Product design thinking (UX, interaction patterns)

These are different skillsets with different mindsets.

Our Experience (Similar To Yours)

Platform team composition:

  • 2 senior SREs (Python, Kubernetes, infrastructure)
  • 1 backend engineer (Go, distributed systems)
  • 1 DevOps engineer (CI/CD, automation)

Strong in: Infrastructure, APIs, systems thinking
Weak in: React, TypeScript, frontend architecture, UX design

Result: Slow feature delivery, frustration, need for external help

The Hiring Challenge

Tried to hire “platform engineer with frontend skills.” This is a niche intersection:

  • Platform engineering candidates → Usually infrastructure-focused
  • Frontend candidates → Usually app development, not internal tools
  • Backstage-specific → Very small talent pool

Time to fill: 7 months
Candidates interviewed: 23
Offers made: 3
Accepted: 1

That’s 7 months of frontend work backlog.

The Contractor Solution (And Its Limits)

Brought in React contractor at /hour.

Pros:

  • Fast ramp-up (React expertise immediately)
  • High-quality code (professional frontend work)
  • Filled critical skill gap

Cons:

  • Expensive (K/year, not budgeted)
  • Coordination overhead (remote, async communication)
  • Knowledge transfer challenges (tribal knowledge not documented)
  • Not invested in platform success (just completing tickets)

Contractors are tactical solution, not strategic.

The Upskilling Attempt

Tried to teach infrastructure engineers React/TypeScript.

What we learned:

  • Infrastructure engineers can learn React, but not while shipping features
  • 3-6 month learning curve minimum
  • Reduced productivity during transition
  • Some engineers interested, others resistant (“I didn’t sign up for frontend work”)

Upskilling takes time you don’t have when platform is critical path.

The Real Question: Why Are We Solving Frontend Problems?

Here’s what finally clicked for me:

Our platform team should be solving:

  • How to onboard services faster (organizational workflow)
  • How to encode architecture patterns in templates (systems design)
  • How to automate deployments safely (process automation)
  • How to measure and improve productivity (platform effectiveness)

We were instead solving:

  • How to make catalog pagination performant (React optimization)
  • How to build accessible dropdown components (frontend engineering)
  • How to manage TypeScript build configuration (tooling setup)
  • How to debug React hook lifecycle issues (framework debugging)

First set = Platform engineering. Second set = Frontend engineering.

We hired platform engineers but needed frontend engineers. That’s a hiring mismatch, not a skill gap.

The Managed Platform Advantage (Skills Perspective)

When we switched to Roadie:

We stopped needing:

  • React/TypeScript expertise for portal UI
  • Frontend build tooling knowledge
  • Web application architecture skills
  • Component library maintenance

We could focus on:

  • Platform capabilities (what we’re good at)
  • Organizational automation (our expertise)
  • System integrations (our strength)
  • Developer experience (our mandate)

Team composition after switch:

  • Same 4 engineers (no hiring/firing)
  • All focused on backend + integrations
  • Playing to strengths, not weaknesses

Productivity increased 3x because we stopped fighting skillset mismatch.

Your Maintenance Burden Numbers

75% maintenance vs 25% features—this matches our experience exactly.

Maintenance breakdown (our reality):

  • Monthly Backstage upgrades: 40 hours
  • Plugin compatibility testing: 30 hours
  • Bug fixes and troubleshooting: 60 hours
  • Infrastructure ops (database, scaling): 25 hours
  • Security patches and dependencies: 20 hours
  • Developer support: 40 hours

Total: ~215 hours/month = ~1.35 FTEs

But that’s just keeping current system running. Add new feature requests from developers (“can we add X integration?”) and maintenance becomes 2-3 FTEs.

4-person team, 2.5 on maintenance = 62.5% capacity just to keep lights on.

That’s unsustainable and soul-crushing.

The Pilot Question: Why We Didn’t Do It Either

You asked why you didn’t run a 30-day Roadie trial. We didn’t either. Why?

Honest answer: We’d already decided to build.

The “build vs buy” debate was theater. Engineering leadership wanted to build. The decision was made emotionally (“we’re builders”), then justified rationally (“cost savings”, “control”).

Running a pilot would have risked proving us wrong.

So we skipped it, confident in our decision, and learned the hard way.

For Teams Considering DIY

Skill assessment questions (answer honestly):

  1. Does your platform team have production React/TypeScript experience?
  2. Can your team build a high-quality web application while maintaining it long-term?
  3. Do you have frontend engineers or infrastructure engineers trying to learn frontend?
  4. Will engineers be excited about React work or see it as distraction from “real” platform work?
  5. Can you hire frontend platform engineers in <3 months if needed?

If answers are mostly “no”, DIY Backstage will be painful.

Your infrastructure skills don’t translate to frontend product engineering. It’s a different job.

The Decision I Wish We’d Made

Start with Roadie. Use platform team’s actual strengths (infrastructure, automation, integration) to build golden paths and capabilities.

If after 12 months we’ve outgrown Roadie, then consider self-hosting—with evidence of real limitations and a frontend-capable team in place.

But we’d probably find Roadie + custom plugins solves 95% of needs. And we’d have shipped 10-15 capabilities instead of 3.

Advice For Your Specific Situation

Your 40% adoption after 18 months reveals the product quality problem.

Post-migration: 90% adoption in 9 months. That’s the power of professional frontend work.

Developers don’t care about hosting model. They care about tool quality. Managed platforms have professional frontend teams building polished products. Your infrastructure team can’t compete with that, and shouldn’t have to.

Takeaway: Don’t ask infrastructure engineers to build products. Let them build capabilities on professional product foundations.

Your story validates everything we learned the hard way. Thank you for sharing.

The opportunity cost calculation—.84M in lost productivity—is the number that would have changed our decision.

The Business Case Finance Actually Cares About

When you present “build vs buy” to CFO, they hear:

  • Build: .6M
  • Buy: K
  • Savings: K

That’s meaningful but not game-changing for a Series B company.

Your full calculation:

  • Direct cost: K
  • Opportunity cost: .84M
  • Turnover cost: K+
  • Total impact: .1M+

That’s game-changing. That’s 25+ engineering headcount for a year. That’s a full product line. That’s serious strategic capacity.

How To Calculate Opportunity Cost (For My Future Pitch)

You showed 200 devs × 4 hrs/week × /hr × 48 weeks × 1.5 years = .32M

Question: How do you measure/estimate the “4 hours/week wasted” baseline?

I need this methodology to be defensible to our CFO. Options:

1. Developer survey (before platform):
"How many hours/week do you spend on:

  • Finding documentation
  • Waiting for approvals/access
  • Manual deployment tasks
  • Environment setup/debugging
  • Other undifferentiated work"

2. Workflow timing studies:

  • Time to onboard new service: 8-12 hours (manual process)
  • Time to deploy: 30-45 minutes (manual, error-prone)
  • Time to find documentation: 10-15 minutes (scattered, stale)

3. Comparative benchmarking:

  • Industry data: Developers with good platform waste ~1 hr/week
  • Without platform: ~5-6 hrs/week (your number)
  • Delta: 4-5 hrs/week savings potential

Which approach did you use? Or combination?

I need to present this to our CFO and want the calculation to be bulletproof.

The Lost Feature Delivery Angle

You shipped 3 capabilities in 18 months (DIY), then 8 capabilities in 9 months (post-migration).

That’s a 5.3x increase in feature velocity.

Question for you: What were those capabilities worth to the organization?

Examples from your context:

  • Service onboarding automation → Faster team scaling = faster hiring = faster growth
  • Deployment automation → 3x deploy frequency = faster feature delivery = faster revenue
  • Cost visibility → K/month optimization = K/year savings

If we estimate each platform capability enables K/year in productivity or savings, then:

DIY scenario: 3 capabilities × K = .5M/year value delivered
Managed scenario: 11 capabilities (projected) × K = .5M/year value delivered

Opportunity cost: M/year in delayed value delivery

That dwarfs the K direct cost difference.

The Pitch I’m Building For Our Leadership

Slide 1: The Choice

  • DIY: .9M over 3 years, 18-month timeline
  • Managed: .7M over 3 years, 4-month timeline
  • Recommend: Managed platform

Slide 2: The Direct Costs

  • 3-year savings: .2M
  • Faster time-to-value: 14 months earlier
  • Lower risk: Proven solution vs custom build

Slide 3: The Opportunity Cost (This is the key slide)

  • Lost productivity during build phase: .8M
  • Delayed platform capabilities: M+ in deferred value
  • Engineering capacity freed: 3-4 FTEs for product work
  • Total opportunity benefit: .8M+

Slide 4: The Risk Mitigation

  • Team capacity: 3 engineers building vs 6 engineers split between maintenance and building
  • Adoption risk: Professional product quality = high adoption
  • Talent risk: Hire platform engineers, not niche framework experts
  • Timeline risk: 4 months vs 18 months = 14 months less execution risk

Does this framing resonate? What would you add/change based on your experience?

The Question That Will Come Up

“If we start with managed platform and outgrow it, can we switch to self-hosted later?”

Your answer (based on your migration experience):

Migration took 2 months and cost K. That’s a reasonable exit cost if needed after 12-18 months of learning real requirements.

Better to:

  • Start with managed, get value in 2 months
  • Learn what you actually need over 12 months
  • Decide with evidence whether to migrate

Than to:

  • Start with DIY, spend 18 months building
  • Learn it doesn’t work
  • Migrate to managed

First path = 2 months to value + option to change
Second path = 18 months to value + no option value

The Framework For Presenting Opportunity Cost

Formula:

For your scenario:

Your direct calculation (.84M) was conservative. True total is even higher.

For Other Product Leaders

When your engineering team wants to build:

Ask:

  1. “What’s the opportunity cost of the build timeline?” (Calculate lost productivity during delay)
  2. “What features won’t we build because platform team is maintaining framework?” (Quantify delayed capabilities)
  3. “What could we do with the cost savings?” (Show alternative investments)

Frame as:

  • “We can spend .9M building platform, or .7M buying platform and .2M on product features that drive revenue”
  • “We can have platform in 18 months, or in 4 months plus 14 months of productivity gains”
  • “We can have 3 platform engineers maintain + build, or 5 platform engineers building golden paths”

Make opportunity cost visible. It’s usually 3-5x larger than direct cost.

Final Question

How did you present the migration decision to leadership without it feeling like “admitting we wasted .5M”?

I may need to have this conversation and want to frame it constructively:

  • Acknowledge the learning
  • Present it as strategy update based on evidence
  • Focus on future value, not past sunk cost
  • Get buy-in without blame

Any advice on that conversation?

Your 40% → 90% adoption jump tells the entire product story.

Adoption Is The Only Metric That Matters

You can build the most technically sophisticated platform in the world. If developers don’t use it, ROI is zero.

Your results:

  • DIY Backstage: 18 months, .6M, 40% adoption → ,000 per percentage point of adoption
  • Managed platform: 9 months, K, 90% adoption → ** per percentage point of adoption**

Managed platform was 8.4x more cost-effective at driving adoption.

Why? Because product quality matters for internal tools just like external products.

The Product Quality Gap

DIY platforms typically deliver:

  • Functional but rough UX
  • Inconsistent interaction patterns
  • Breaks occasionally (upgrade issues, bugs)
  • Slow performance (not optimized)
  • Mobile unfriendly
  • Accessibility gaps
  • Feels “homemade”

Developers compare to tools they use daily:

  • GitHub (polished, fast, delightful)
  • Linear (pixel-perfect design)
  • Vercel (smooth animations, instant feedback)
  • Figma (professional quality)

When internal portal feels amateurish by comparison, adoption suffers.

Your 40% Adoption Revealed Product-Market Fit Failure

After 18 months, 60% of developers chose not to use the portal.

That’s a massive red flag. In product terms:

  • 40% adoption = failing product
  • 60% of target users are saying “no thanks, I’ll use workarounds”
  • That’s worse than most failed startups (which usually hit 0-20% adoption)

Questions I’d ask (product autopsy):

  • Why did 60% of developers avoid the portal?
  • What workarounds were they using instead?
  • What feedback did you get in surveys?
  • What was developer satisfaction score? (probably 3-4/10)

My guess:

  • “Too slow” (performance not optimized)
  • “Breaks too often” (upgrade issues, bugs)
  • “Missing features I need” (team busy maintaining, not building)
  • “Easier to do it manually” (high friction, low value)

That’s product-market fit failure. And no amount of technical sophistication fixes it.

The 90% Adoption After Migration

Post-migration to Roadie:

  • 9 months
  • 90% adoption
  • Developer satisfaction probably jumped to 8-9/10

What changed?

Not just features. You didn’t ship 5x more features in 9 months than you built in 18 months.

Product quality changed:

  • :white_check_mark: Professional, polished UI
  • :white_check_mark: Reliable, doesn’t break
  • :white_check_mark: Fast, performant
  • :white_check_mark: Features developers actually wanted (golden paths, automation)
  • :white_check_mark: Continuous improvements (Roadie’s product team ships weekly)

Developers trusted the tool. So they used it.

The Product Thinking You Applied (Post-Migration)

Reading between the lines, post-migration you:

1. Focused on user value, not infrastructure:

  • Built service onboarding automation (developers love this)
  • Built deployment self-service (high-value, high-adoption)
  • Built cost visibility (engineering leadership loves this)

vs DIY phase priorities:

  • Build catalog system (infrastructure)
  • Build authentication (infrastructure)
  • Build RBAC (infrastructure)
  • Debug React performance issues (maintenance)

Users don’t care about infrastructure. They care about outcomes.

2. Measured adoption and satisfaction:

  • Tracked who’s using the portal
  • Surveyed developer satisfaction
  • Prioritized features based on impact

vs DIY phase (probably):

  • Assumed developers would use it once built
  • Didn’t measure adoption rigorously
  • Prioritized technical completeness over user value

3. Shipped iteratively:

  • 8 capabilities in 9 months = ~1 capability per month
  • Developers saw continuous value delivery
  • Built momentum and trust

vs DIY phase:

  • 3 capabilities in 18 months = 1 capability every 6 months
  • Long gaps between value delivery
  • Developers lost faith in platform team

The “Build Trap” You Fell Into

Your DIY phase is textbook “build trap”:

Build trap symptoms:

  • :white_check_mark: Optimize for building, not outcomes
  • :white_check_mark: Focus on technical sophistication, not user value
  • :white_check_mark: Measure effort (“we built X features”), not impact (“developers saved Y hours”)
  • :white_check_mark: Assume “if we build it, they will come” (they didn’t)
  • :white_check_mark: Blame users for low adoption (“they don’t understand the value”)

Escape from build trap:

  • :white_check_mark: Optimize for user outcomes
  • :white_check_mark: Focus on adoption and satisfaction
  • :white_check_mark: Measure impact, not output
  • :white_check_mark: Earn adoption through value delivery
  • :white_check_mark: Treat low adoption as product problem to solve

Your migration to managed platform forced you out of build trap.

Because Roadie handled infrastructure, you had to focus on user value. That’s where you found product-market fit.

For Teams Evaluating This

Product lens questions:

1. Can we build a product developers will love?

  • Do we have product design skills?
  • Do we have frontend engineering skills?
  • Can we commit to continuous UX improvement?
  • Can we compete with professional SaaS products on quality?

If answers are “no”, buy the product, build the capabilities on top.

2. How will we measure success?

  • Active usage (daily/weekly)
  • Adoption rate (% of developers using it)
  • Satisfaction score (NPS, surveys)
  • Time saved (measured outcomes)

If metrics aren’t defined, you’ll optimize for wrong things.

3. What’s our product-market fit validation plan?

  • 30-day pilot with real users
  • Measure adoption and satisfaction
  • Identify gaps and missing features
  • Evidence-based decision

If you skip validation, you’ll spend .5M learning what a pilot would teach you for free.

The Honest Question For Engineering Teams

“If your internal platform were an external SaaS product competing with Roadie/Humanitec, would customers choose it?”

Be brutally honest:

  • Is UX as polished?
  • Is performance as good?
  • Is reliability as high?
  • Are features as complete?
  • Is improvement velocity as fast?

If answer is “no”, then internal users shouldn’t have to settle for less.

Your developers deserve tools as good as the external tools they use daily.

Why The 40% Adoption Hurt So Much

Low adoption = wasted investment, regardless of cost:

  • .6M spent
  • 18 months invested
  • 3 major capabilities built

But only 40% of target users adopted it.

Effective cost per user: .6M / (200 devs × 40%) = ,000 per active user

That’s worse ROI than most enterprise SaaS products (which cost -200/user).

Post-migration effective cost: K / (200 devs × 90%) = ,378 per active user

8.4x better cost-effectiveness.

Final Thought

Your post is a cautionary tale about treating internal tools as infrastructure projects instead of product projects.

Infrastructure thinking → Focus on technical architecture, control, deployment model → 40% adoption, .6M wasted

Product thinking → Focus on user value, adoption, satisfaction → 90% adoption, 461% ROI

The .69M lesson you learned: Platform is a product for internal users. Treat it like one.

Thank you for sharing this story. Every product leader evaluating platform strategy should read it.