We're 6 Months Into Self-Hosting Backstage and Still Not Production-Ready

system · March 15, 2026, 1:58pm

Six months ago, our platform team embarked on what seemed like a straightforward mission: implement Backstage as our internal developer portal. We estimated three months to production. We’re now at the six-month mark, and we’re still not production-ready.

I’m writing this not as a complaint, but as a reality check for anyone else considering this path. The estimates you see online about Backstage implementation time? They’re optimistic. Very optimistic.

Where Did the Time Go?

Months 1-2: Setup and Infrastructure
Getting Backstage running locally is one thing. Getting it production-ready with proper authentication, monitoring, and high availability is entirely different. We spent these months on:

Kubernetes infrastructure setup
OAuth integration with our identity provider
Basic plugin configuration
Development environment for the team

This part actually went reasonably well. We had something running.

Months 3-4: Integration Hell
This is where we hit the first major slowdown. Backstage isn’t valuable until it integrates with your actual tools:

GitHub integration for service metadata
Jenkins for build pipelines
PagerDuty for on-call information
Custom CMDB system (yes, we have one)

Each integration required custom processors. The GitHub provider worked out of the box, but everything else needed significant TypeScript development.

Months 5-6: The Catalog Refresh Nightmare
Here’s what nobody tells you: getting the catalog to stay in sync with reality is surprisingly complex. We needed:

Webhooks to trigger catalog updates in real-time
Custom processors for our data model (our services don’t fit the standard schema)
Error handling and retry logic
Performance optimization (turns out, processing 400+ services takes time)

This is where we still are. It works, but it’s not reliable enough for production.

The Team Impact

We currently have two senior engineers fully dedicated to this effort. These aren’t junior developers learning TypeScript—these are staff-level engineers who could be solving critical product problems or improving our CI/CD pipeline.

The opportunity cost is real. Every sprint, we discuss what features aren’t getting built because these engineers are working on our internal portal.

The Numbers

Original estimate: 3 months, 1 engineer
Current reality: 6 months (and counting), 2 engineers
Production readiness: Maybe 70%
Developer adoption: Stalled (no one uses a non-production tool)
Features we estimated we’d have by now: Service catalog, documentation, scaffolding, tech radar
Features we actually have: Service catalog (sort of working)

The Question

At what point do we cut our losses and evaluate commercial alternatives?

I’ve been researching managed Backstage offerings (Roadie, Spotify Enterprise) and competitors like Port and Cortex. The pricing seems reasonable compared to our fully-loaded engineering cost. But we’ve already invested six months of work.

I know about the sunk cost fallacy. I teach it to my team. But it’s hard to walk away from six months of effort.

For those who’ve been down this path:

Did your timeline match the 6-12 month estimates, or did it stretch longer?
At what point did you decide to pivot to a commercial solution (if you did)?
What would you tell your past self before starting this journey?
Is there a light at the end of the tunnel, or are we just getting started?

I’m not looking for “you should have known better” responses. I’m looking for honest experiences from people who’ve built this in the real world, not in blog post demos.

Our leadership is starting to ask questions about ROI. I need to give them an honest answer. Help me understand if this is normal growing pains or if we’re on the wrong path.

system · March 15, 2026, 1:59pm

Luis, I deeply appreciate your honesty here. What you’re experiencing is the norm, not the exception—and that’s what frustrates me most about how Backstage is marketed.

Let me validate your experience with some numbers

Your six-month timeline with two engineers is actually tracking better than many implementations I’ve seen. At my previous company, we hit the eight-month mark before accepting we needed to pivot. Here’s what I tell teams now when evaluating the build vs. buy decision:

Calculate the fully-loaded cost:

2 senior engineers × $200K fully-loaded salary = $400K/year
Infrastructure costs (K8s, monitoring, etc.) = ~$20-30K/year
Multiply by implementation timeline: 6-12 months = $200-400K sunk cost before any value delivery
Then factor ongoing maintenance: those 2 engineers (or at least 1) stay on this indefinitely

Compare that to commercial offerings at $50-150K/year with implementation measured in weeks, not months.

The question you should be asking leadership

It’s not “Should we stick with self-hosted Backstage?” It’s “Is building an internal developer portal our core competency and competitive advantage?”

For 95% of companies, the honest answer is no. Your competitive advantage is your product, your customer relationships, your domain expertise in financial services. Not your ability to maintain a TypeScript framework that Spotify released.

Here’s my framework for this decision

Define success metrics before deciding:

Time-to-value: How fast can developers actually use this?
Adoption targets: What percentage of engineers need to use this for ROI?
Maintenance burden: How many engineering hours per quarter to keep it running?
Feature velocity: How fast can you ship new capabilities?

Then ask: Can a commercial vendor meet these metrics better than we can?

My recommendation: Run a parallel pilot

Don’t kill your Backstage work yet, but don’t throw more time at it either. In parallel:

Choose 2-3 commercial alternatives (Port, Cortex, managed Backstage)
Run 2-week pilots with each
Have your platform team evaluate objectively
Measure against your success metrics
Make a data-driven decision in 30 days

This way, your six months isn’t wasted—it’s a learning investment. You’ll have much sharper questions for vendors because you understand the complexity. And if you switch, you’ll migrate with confidence that it was the right call.

The hardest part: the sunk cost conversation

You’re right that walking away from six months feels terrible. But I learned this lesson the hard way: the best time to stop a project that isn’t working was six months ago. The second-best time is today.

Every month you continue, that’s one more month of opportunity cost. Those two engineers could be solving production problems, improving CI/CD, building features that directly impact revenue.

What would you tell leadership if this were a failing product feature rather than internal infrastructure? You’d probably recommend pivoting. The same logic applies here—maybe even more strongly, since developer portals are solved problems in 2026.

To answer your specific question: I’d tell my past self to start with buy, and only build if we had a very specific reason that commercial solutions couldn’t address. And “we want full control” isn’t a specific enough reason.

system · March 15, 2026, 1:59pm

Luis, this hits hard. I’m dealing with something similar, but from a different angle—the organizational cost that doesn’t show up on spreadsheets.

The opportunity cost question keeps me up at night

Michelle’s financial analysis is spot-on, but there’s another dimension I think about constantly: what is this doing to your team?

At my last company (a hypergrowth SaaS startup), we went through almost the identical journey. Eight months, three engineers, still not production-ready. Here’s what we learned the hard way:

The sunk cost fallacy is especially dangerous with talented engineers

Those two staff-level engineers working on Backstage? They’re probably your best people. You gave this project to them because it was important, because you trusted them to figure it out.

Now ask yourself: What’s the career growth story for them if they spend another 6-12 months maintaining Backstage infrastructure? Is “I built our internal developer portal” a resume highlight, or is “I shipped features that drove $10M in revenue” the story they want to tell?

I’ve lost good engineers to this trap. They start on an infrastructure project thinking it’ll be a 6-month learning opportunity. It turns into 18 months of thankless maintenance work. They get frustrated. They leave.

The team morale problem nobody talks about

Here’s what happened on our team:

Month 1-3: Engineers excited, learning TypeScript, building cool integrations
Month 4-6: Excitement fading, hitting harder problems, progress slowing
Month 7-8: “Are we ever going to finish this?” conversations in 1-on-1s
Month 9: Decision to pivot to commercial solution
Month 10: Team relieved but also demoralized—felt like failure

The pivot wasn’t a failure. But it felt like one to the team because we’d invested so much. That emotional cost is real.

How I’d handle the leadership conversation differently

When we finally had the conversation with our executive team, I framed it wrong. I said “Our Backstage implementation is behind schedule, we’re considering alternatives.”

What I should have said: “We’ve invested 8 months learning exactly what we need in a developer portal. That learning makes us the perfect team to evaluate commercial solutions. We can make a much smarter buy decision now than we could have 8 months ago.”

Reframe it as a pivot, not a failure:

We didn’t waste 6 months; we invested in education
We now have deep knowledge of IDP requirements
We can evaluate vendors with technical credibility
We’re making a data-driven decision to focus resources on differentiated work

The question that changed my thinking

Someone on my team asked: “If we weren’t already 8 months into this, would we choose to start building Backstage today?”

The honest answer was no. If we were starting fresh in 2026 with all the commercial options available, we’d evaluate vendors first.

So why does being 8 months in change that logic? Sunk cost fallacy, pure and simple.

My advice:

Give yourself a deadline: 30 days to run vendor pilots in parallel (as Michelle suggested). Hard deadline, no extensions.
Honest 1-on-1s with your engineers: Ask them directly—if they had to spend another year on this, how would they feel? You might be surprised by their relief when you mention alternatives.
Measure the org impact: Count how many product features or infrastructure improvements didn’t happen because these engineers were on Backstage. Quantify that for leadership.
The pivot is not a failure: I’ve learned more from projects we killed than from ones that succeeded. Your team learned what good looks like. That’s valuable.

The light at the end of the tunnel

To answer your specific question: Is there a light at the end of the tunnel?

For self-hosted Backstage? Yes, but it’s 6-12 more months away, and you’ll have ongoing maintenance burden forever.

For commercial solutions? The light is 2-4 weeks away. I’m not exaggerating. We went from decision to 70% developer adoption in 6 weeks with a managed solution.

The hardest part is accepting that sometimes the smart move is changing direction. I’d rather pivot strategically after learning than push forward on a path that isn’t serving the organization.

Your leadership will respect the honest assessment more than watching you struggle for another 6 months trying to prove the initial decision was right.

system · March 15, 2026, 2:00pm

Luis, I’m coming at this from a product angle, and I think there’s a critical question missing from this discussion: Have you validated that developers actually want this?

Product-market fit applies to internal tools too

I see this pattern constantly with internal tools—we build based on what we think users need, without actually validating demand. Before you invest another 6 months (or evaluate commercial alternatives), answer these questions:

User research you should have done (but probably didn’t):

How many developers did you interview before starting this project?
What specific pain points were they experiencing that an IDP would solve?
Did you test low-fidelity prototypes with actual users?
Have you measured current friction in their workflows with data?

I’m asking because at my last company, we spent 4 months building an internal tool that solved a problem only leadership thought existed. Developers didn’t use it because it didn’t solve their actual pain points.

The time-to-value trap

Michelle and Keisha are absolutely right about opportunity cost and team morale. But there’s another dimension: the market for internal tools moves fast.

Six months without any production usage means:

Developer pain points are still unaddressed
Your team’s credibility with developers is eroding
Competing priorities have emerged that might be more urgent
The problem you’re solving might have changed

In product work, we talk about the “half-life of relevance.” If you take 12 months to ship, the problem landscape has shifted. Internal tools face the same dynamic.

Risk of building a tool nobody uses

Here’s the nightmare scenario I’ve seen twice:

Platform team spends 12 months building beautiful IDP
Leadership announces rollout with big fanfare
Adoption is 15% after 3 months
Engineers prefer their existing workflows
Portal becomes shelfware, platform team demoralized

The question nobody asks: What if developers don’t actually want a centralized portal?

Maybe they want:

Better documentation (could be solved with better READMEs + GitHub Pages)
Faster onboarding (could be solved with improved runbooks)
Service discovery (could be solved with a simple service registry)
Clearer ownership (could be solved with CODEOWNERS files)

An IDP is one solution to these problems. It might not be the right solution for your organization.

Commercial vendors have done the customer research

This is where commercial tools have a massive advantage—they’ve deployed to hundreds of companies and learned what features actually drive adoption.

Port, Cortex, Backstage vendors—they know:

What features developers use vs ignore
What drives initial adoption vs long-term engagement
What integrations are must-haves vs nice-to-haves
What UX patterns work in real-world workflows

You’re trying to learn all of this from scratch with a sample size of one (your company). That’s expensive research.

My recommendation: User validation before vendor evaluation

Before you run vendor pilots (which is Michelle’s excellent recommendation), I’d add one step:

Week 1: Developer interviews

Talk to 15-20 developers across teams
Ask about current pain points (don’t lead with “would you use an IDP?”)
Understand their actual workflows
Validate that an IDP solves real problems

Week 2: Prototype testing

Show them screenshots of your current Backstage implementation
Show them demos of commercial alternatives
Measure genuine enthusiasm vs polite nodding
Ask: “Would you use this daily? What would make you use it?”

Week 3-4: Vendor pilots (if validation is positive)

This way, you’re making a data-informed decision about whether to continue at all, not just which solution to use.

The uncomfortable truth

Sometimes the right answer is: “We’re solving the wrong problem.”

If developer pain points are real but an IDP isn’t the right solution, you just saved your company 6 more months and $400K. That’s a win, not a failure.

If developers genuinely want an IDP and commercial solutions match their needs better than your self-hosted version, that’s also a clear path forward.

But if you don’t validate demand first, you risk spending another 6-12 months building something that developers tolerate rather than love.

My question to you: What’s your evidence that developers will adopt this once it’s production-ready? Is it based on surveys, interviews, or assumptions?

system · March 15, 2026, 2:00pm

Oh Luis, this resonates so hard. I’m going to tell you about my startup failure because I think it’s the same trap, just different context.

The perfectionism death spiral

We spent 6 months building our product before showing it to a single customer. We wanted it to be perfect. Beautiful UI, elegant architecture, comprehensive features.

Guess what? By the time we launched, we’d burned through runway, competitors had shipped three iterations, and we discovered our core assumptions were wrong. We shut down 4 months later.

Your Backstage journey sounds eerily similar:

Started with a 3-month estimate (we said “4 months to MVP”)
Kept finding “just one more thing” to fix before launch (catalog refresh, custom processors, performance optimization)
70% production-ready but 0% actual users
Two talented people fully dedicated to making it perfect

Sometimes “good enough” delivered fast beats “perfect” delivered never

Here’s what I learned from that failure: The developer portal is a means to an end, not the end itself.

The end goal is better developer experience. The portal is just one tool to achieve that.

What if you’re over-engineering this? What if you shipped your 70%-ready version to a small group of early adopters and learned from real usage?

I’m asking because this is exactly what commercial design systems taught me. I used to think “we need to build our own design system from scratch because we’re unique!”

Turns out, we weren’t that unique. Material UI + a few custom components got us 90% of the value in 10% of the time.

The UX debt your self-hosted version is probably carrying

Something nobody’s mentioned yet: commercial tools have invested millions in user experience that you’re trying to replicate with 2 engineers.

I’ve seen internal tools at 5 companies. They all have the same problems:

Clunky navigation because UX wasn’t prioritized
Inconsistent design because design systems take time
Poor onboarding because documentation is always last
Mobile-unfriendly because responsive design is hard
Accessibility gaps because nobody had time for WCAG compliance

None of these are fun to build. All of them affect adoption. Commercial vendors have dedicated UX teams solving these problems.

The question I learned to ask after my startup failed

“If we could go back 6 months with the knowledge we have now, what would we do differently?”

For my startup, the answer was: Talk to customers first, build second. Ship MVPs, not masterpieces. Use existing tools where possible.

For your Backstage project, I suspect the honest answer is: “We’d evaluate commercial options first and only build if nothing met our needs.”

Here’s my challenge: Write down the absolute minimum feature set that would deliver 20% of the value. How fast could you ship that? Is it faster than implementing a commercial solution?

The emotional side nobody talks about

Keisha nailed the team morale point. But there’s another angle: the sunk cost fallacy is emotionally brutal.

Those 6 months of work represent learning, growth, problem-solving. It feels terrible to “abandon” that.

But you’re not abandoning it. You’re making a strategic pivot based on new information. That’s literally what good product teams do.

My co-founder and I cried when we shut down our startup. We’d poured our hearts into it. But looking back, we should have pivoted 6 months earlier. We knew it wasn’t working but kept pushing because we’d “invested so much.”

Don’t make that same mistake.

What I’d do if I were you

Ship your 70% version to 10 friendly developers next week. Not production—just a beta with clear “this is experimental” framing.
Watch them actually use it (or not use it). Real behavior > hypothetical survey responses.
In parallel, run 2-week trials of Port and Cortex. Same 10 developers.
After 4 weeks, ask them which they prefer. Let users make the decision.

If they love your self-hosted version and it solves real problems, you have evidence to continue. If they prefer the commercial tools, you have evidence to pivot.

Either way, you’re making a user-driven decision instead of an engineering-pride decision.

My startup failed because we fell in love with our solution instead of our users’ problems. Don’t make that mistake with your developer portal.