Six months ago, our platform team embarked on what seemed like a straightforward mission: implement Backstage as our internal developer portal. We estimated three months to production. We’re now at the six-month mark, and we’re still not production-ready.
I’m writing this not as a complaint, but as a reality check for anyone else considering this path. The estimates you see online about Backstage implementation time? They’re optimistic. Very optimistic.
Where Did the Time Go?
Months 1-2: Setup and Infrastructure
Getting Backstage running locally is one thing. Getting it production-ready with proper authentication, monitoring, and high availability is entirely different. We spent these months on:
- Kubernetes infrastructure setup
- OAuth integration with our identity provider
- Basic plugin configuration
- Development environment for the team
This part actually went reasonably well. We had something running.
Months 3-4: Integration Hell
This is where we hit the first major slowdown. Backstage isn’t valuable until it integrates with your actual tools:
- GitHub integration for service metadata
- Jenkins for build pipelines
- PagerDuty for on-call information
- Custom CMDB system (yes, we have one)
Each integration required custom processors. The GitHub provider worked out of the box, but everything else needed significant TypeScript development.
Months 5-6: The Catalog Refresh Nightmare
Here’s what nobody tells you: getting the catalog to stay in sync with reality is surprisingly complex. We needed:
- Webhooks to trigger catalog updates in real-time
- Custom processors for our data model (our services don’t fit the standard schema)
- Error handling and retry logic
- Performance optimization (turns out, processing 400+ services takes time)
This is where we still are. It works, but it’s not reliable enough for production.
The Team Impact
We currently have two senior engineers fully dedicated to this effort. These aren’t junior developers learning TypeScript—these are staff-level engineers who could be solving critical product problems or improving our CI/CD pipeline.
The opportunity cost is real. Every sprint, we discuss what features aren’t getting built because these engineers are working on our internal portal.
The Numbers
- Original estimate: 3 months, 1 engineer
- Current reality: 6 months (and counting), 2 engineers
- Production readiness: Maybe 70%
- Developer adoption: Stalled (no one uses a non-production tool)
- Features we estimated we’d have by now: Service catalog, documentation, scaffolding, tech radar
- Features we actually have: Service catalog (sort of working)
The Question
At what point do we cut our losses and evaluate commercial alternatives?
I’ve been researching managed Backstage offerings (Roadie, Spotify Enterprise) and competitors like Port and Cortex. The pricing seems reasonable compared to our fully-loaded engineering cost. But we’ve already invested six months of work.
I know about the sunk cost fallacy. I teach it to my team. But it’s hard to walk away from six months of effort.
For those who’ve been down this path:
- Did your timeline match the 6-12 month estimates, or did it stretch longer?
- At what point did you decide to pivot to a commercial solution (if you did)?
- What would you tell your past self before starting this journey?
- Is there a light at the end of the tunnel, or are we just getting started?
I’m not looking for “you should have known better” responses. I’m looking for honest experiences from people who’ve built this in the real world, not in blog post demos.
Our leadership is starting to ask questions about ROI. I need to give them an honest answer. Help me understand if this is normal growing pains or if we’re on the wrong path.