I need to write this while it’s still fresh. We just had our quarterly platform retrospective, and the conversation was… uncomfortable. In a good way, but still. It’s time to be honest about what happened with our Backstage implementation.
TL;DR: We spent 9 months building our internal developer portal. Developers used it enthusiastically for about 2 weeks. Then usage dropped to nearly zero. Here’s what we learned the hard way.
The Beginning: So Much Optimism
18 months ago, our CTO announced we’d be building an internal developer portal using Backstage. The excitement was real. We were going to solve all the problems:
- Service discovery (which team owns what?)
- Documentation sprawl (spread across Confluence, GitHub wikis, Google Docs)
- Onboarding friction (new engineers taking 3+ weeks to ship first PR)
- Self-service infrastructure (tired of waiting 2 days for environment provisioning tickets)
We dedicated 4 platform engineers full-time. All incredibly skilled—they built our microservices infrastructure, manage our Kubernetes clusters, handle PCI compliance frameworks. We thought: “How hard could this be?”
Reality Check #1: The Skills Mismatch
Here’s what we didn’t anticipate: platform engineers and web developers have completely different skillsets.
Our team works in Go, Python, and YAML all day. We’re great at distributed systems, infrastructure as code, observability pipelines. But TypeScript? React component development? Frontend state management? That’s a different world.
We assumed web development would be easier than our day job. We were wrong. Building production-grade internal tooling requires expertise we didn’t have. After 3 months of struggling, we hired two frontend-focused engineers specifically for Backstage development.
That wasn’t in the plan or budget.
Reality Check #2: The Timeline Deception
The research said 6-12 months. We were confident we’d hit 6 months because our team is experienced.
We launched in 9 months. But “launched” is generous. What we actually shipped:
- Basic service catalog with ownership data
- Links to documentation (not integrated, just links)
- A scaffolding template for new microservices (that only worked for one language)
- Dashboard showing recent deployments
The sophisticated stuff we promised—self-service infrastructure provisioning, automated compliance checks, integrated observability, golden path templates—those features kept getting pushed to “phase 2.”
18 months later, we’re still building phase 2 features. “Later” keeps getting later when you’re maintaining what you’ve already built.
Reality Check #3: The Adoption Disaster
Initial adoption looked promising: ~15% of our 200 developers used it in the first month. We celebrated! But watch what happened:
- Month 1: 15% weekly active users
- Month 3: 12% weekly active users
- Month 6: 8% weekly active users
- Month 12: 5% weekly active users
- Today: ~3% weekly active users
We built this beautiful portal with service catalogs, scaffolding templates, documentation search, deployment dashboards. And developers… went back to their old workflows.
Why? Because the old workflows were faster and more familiar. Our portal solved problems we thought developers had, not problems they actually have.
The Critical Mistake: Building What We Thought They Needed
Here’s the painful truth: we never asked developers what they actually needed. We made assumptions:
- “Of course they want service discovery!” → Most devs work on 2-3 services, already know who owns them
- “Documentation should be centralized!” → Developers prefer docs near code (READMEs) over portals
- “Scaffolding templates will standardize services!” → Teams have specific needs, one-size-fits-all templates don’t work
- “Deployment dashboards provide visibility!” → CI/CD already shows this, portal adds no value
We built 15 plugins thinking “more features = more value.” What we created was cognitive overload. Developers looked at it, saw complexity, and stuck with simple tools they already knew.
What Managed Solutions Would Have Changed
I’ve been thinking a lot about this lately. If we’d started with a managed Backstage solution (like Roadie), what would be different?
Time to value: We’d have been live in 2-3 weeks instead of 9 months. We could have discovered adoption problems in month 2, not month 12.
Validation before investment: We could have tested whether developers actually care about service catalogs before dedicating 4 engineers for a year.
Feature prioritization: Managed solutions force you to start with basics. We might have learned earlier that our comprehensive approach was wrong.
Opportunity cost: Those 4 engineers could have been improving CI/CD pipelines, building better observability, automating compliance—things that would have directly reduced developer friction.
The sunk cost fallacy is real. We’ve invested so much in self-hosted that migrating to managed feels like admitting defeat. But continuing to throw resources at low-adoption tooling is its own form of defeat.
Lessons for Others Considering Backstage
If you’re where we were 18 months ago, here’s what I wish someone had told me:
1. Developer Buy-In Must Come First
The stat that 20% cite “lack of developer buy-in” as the top failure reason? We’re a textbook case. Build with developers, not for developers. Shadow them. Interview them. Identify friction points through observation, not assumption.
2. Maintenance Is Severely Underestimated
We thought: “Once we build it, we’ll need maybe 1 engineer for maintenance.” Reality: 2 FTE permanently, just to keep it running, upgrade Backstage versions, fix broken plugins, handle security patches.
That’s $400K/year in ongoing costs we didn’t budget for.
3. Managed Lets You Validate Faster
If your goal is to learn whether an IDP solves real problems, managed solutions let you run that experiment in weeks instead of quarters. You can always migrate to self-hosted later if you prove ROI.
Starting with self-hosted means you’re betting big before you know if developers will even use it.
4. TypeScript/React Expertise Isn’t Optional
If your platform team doesn’t have deep frontend skills, self-hosted Backstage will be painful. You’re not just configuring YAML—you’re building React applications. That requires different expertise than infrastructure engineering.
5. Measure Adoption Obsessively
We should have had clear adoption targets: “If we don’t hit 30% WAU by month 3, we pause and reassess.” Instead, we kept building features hoping adoption would magically improve. It didn’t.
What We’re Doing Now
We’re seriously considering migrating to a managed solution. The ROI conversation with leadership is straightforward:
- Current cost: 2 FTE maintenance + low adoption = $400K/year waste
- Managed cost: ~$100K/year with better features and support
- Savings: $300K/year + ability to redeploy engineers to higher-value work
The hard part is admitting we made the wrong choice 18 months ago. But that’s cheaper than making the same wrong choice for another 18 months.
Questions for This Community
For those who’ve been through IDP implementations:
- How did you drive adoption beyond 30%? What actually worked?
- Did you start managed or self-hosted? Would you make the same choice again?
- How did you validate use cases before building? What process prevented assumptions?
- What metrics proved ROI to leadership? Beyond deployment frequency—actual business impact?
I’m sharing this because I don’t think we’re alone. The 89% market share with 10% adoption rate suggests this is a common problem. Maybe if we’re honest about failures, we can help others avoid them.
Update: Reading the comments on Michelle’s thread about build vs buy really drove this home. We’re all making similar mistakes. Let’s learn from each other’s experience instead of repeating the same patterns.