A CTO's Framework for Platform Engineering Build vs Buy Decisions in 2026

cto_michelle · March 23, 2026, 8:16am

After reading Luis and Keisha’s threads, I want to share the framework I use when my team asks about build vs buy decisions - particularly for platform engineering.

This comes from 25 years in tech, most recently as CTO, and countless conversations with boards who want to understand why we’re spending engineering resources on internal infrastructure.

The Question That Starts Everything

Our board asked me last quarter: “Why are 8 engineers building a developer portal instead of building features that serve customers?”

Fair question. And it forced me to articulate something I’d been assuming was obvious (it wasn’t).

My Build vs Buy Framework: 4 Critical Questions

When evaluating whether to build or buy ANY infrastructure (not just Backstage), I ask these four questions. ALL must have strong “yes” answers to justify building.

Question 1: Does This Create Competitive Differentiation?

Translation: Would our competitors struggle to replicate this advantage, and does it matter to customers?

For Backstage/IDPs:

Developer portal itself: NO (commodity infrastructure)
Unique Golden Paths for your specific workflows: YES
Plugin architecture: NO (everyone has access to Backstage plugins)
Integration with your compliance/deployment patterns: YES

The implication: Build the differentiating parts (Golden Paths), buy the commodity parts (portal infrastructure).

Question 2: Can We Maintain This Long-Term (3-5 Year Horizon)?

Translation: Will we still have the expertise, resources, and commitment to maintain this when:

Key engineers leave
Priorities shift
Tech stack evolves
The team that built it moves to other projects

Red flags for Backstage:

“Two engineers can maintain it” (underestimation)
“We’ll figure it out as we go” (no long-term plan)
Platform team reports to Director level (not C-suite priority)

Green flags:

Dedicated platform org with career paths
Multi-year budget commitment
Executive sponsorship at CTO/VP level

Question 3: Is There a Proven Market Alternative That’s 80% Fit?

Translation: Can we buy something that meets 80% of our needs and customize the remaining 20%?

In 2026 for IDPs: YES

Roadie: proven at scale, enterprise customers, excellent support
Spotify Portal: GA since Oct 2025, obviously built by the Backstage creators
Others: Port, Cortex (different approaches, also mature)

The 80/20 insight: Most teams need the same 80% (service catalog, docs, deployment visibility). Differentiation is in the 20% (your specific Golden Paths).

Buy the 80%, build the 20%.

Question 4: What’s the Opportunity Cost of Our Best Engineers?

Translation: If these engineers weren’t building this, what would they build instead, and what’s that worth?

For our company (real numbers):

8 engineers on platform (fully loaded: $1.5M/year)
Managed Backstage alternative: $180K/year
Delta: $1.32M saved

Opportunity analysis:

Those 8 engineers could build features driving est. $4M ARR
Or reduce deployment time (saving $2M in eng productivity)
Or improve reliability (reducing $3M in downtime costs)

Board perspective: “You’re spending $1.5M to avoid spending $180K, AND missing out on $4M in revenue. Why?”

I didn’t have a good answer. We switched to managed.

How This Framework Applies to Backstage in 2026

Let me work through each question specifically for IDP decisions:

Q1 (Differentiation): Self-hosting the portal infrastructure does NOT create competitive differentiation. Your unique Golden Paths do.

Q2 (Long-term maintenance): Most companies underestimate Backstage maintenance. It requires dedicated team, ongoing plugin development, React expertise, upgrade management.

Q3 (Market alternatives): Multiple proven managed Backstage options exist in 2026. They handle the 80% commodity infrastructure.

Q4 (Opportunity cost): Platform engineers are your best engineers. What else could they build?

Conclusion: For most companies, self-hosting Backstage fails 3 of 4 questions. Buy managed, focus engineers on differentiation.

When to Still Build: The Exceptions

I’m not saying “never build.” There are legitimate exceptions:

Exception 1: Truly Unique Requirements

Example: Financial services with regulations that literally no vendor can meet
Test: “Can I explain to my board why no vendor can solve this?”
Warning: “We need customization” ≠ “no vendor can meet our needs”

Exception 2: Platform IS Your Strategic IP

Example: You’re Netflix, Spotify, Google - platform engineering at massive scale
Test: Does your platform create defensible moat for your business?
Reality check: Most companies aren’t in this category

Exception 3: Proven Excellence in Platform Engineering

Example: 200+ engineers, dedicated platform org, >60% voluntary adoption
Test: Can you demonstrate platform engineering as a core competency?
Data point: If adoption is <50%, you haven’t earned the right to build

The AI Factor: Why This Matters Even More in 2026

Platform engineering is converging with AI Ops. Platforms must support:

AI agent authentication and RBAC
Resource quotas and cost controls for LLM calls
Prompt management and model routing
Audit trails for AI-generated changes

DIY approach: Build all of this from scratch
Managed approach: Vendors already building it because their customers demand it at scale

Which would you rather: your platform team building AI agent RBAC, or building the unique Golden Paths that differentiate your developer experience?

Executive Communication: How to Talk to Boards

Here’s how I frame platform engineering decisions to our board:

Bad framing: “We need Backstage for developer productivity”
Good framing: “We’re investing $180K to free 8 engineers ($1.5M) to build features driving $4M ARR”

Bad framing: “We need custom plugins for our workflows”
Good framing: “We’ll buy commodity infrastructure and focus engineers on Golden Paths that differentiate our developer experience”

Bad framing: “Managed solutions don’t have all our features”
Good framing: “Managed solutions provide 80% baseline, we’ll build the 20% that creates competitive advantage”

Boards care about ROI in business terms: revenue enabled, costs avoided, profit contribution.

My Advice to Engineering Leaders

Apply this framework to every build vs buy decision, not just Backstage
Be honest about your capabilities - maintaining commodity infrastructure isn’t a competitive advantage
Measure obsessively - if adoption is low, question whether you’ve built the right thing
Think in opportunity cost - what aren’t you building because you’re maintaining infrastructure?
Communicate in business terms - boards want to see ROI, not technical arguments

Questions I Can Answer

How to build business cases for managed solutions
Frameworks for other build vs buy decisions
Managing board expectations around platform investments
Measuring platform engineering ROI in business terms

This framework has served me well across multiple companies and board conversations. Hope it helps others navigate these decisions.

eng_director_luis · March 23, 2026, 8:18am

Michelle, thank you for this framework. I’m literally going to bring this to our next platform team planning session.

The Compliance Question Again

Your Exception 1 really resonates: “Financial services with regulations that literally no vendor can meet.”

But here’s what I’m wrestling with: Is this actually true, or is it an excuse we tell ourselves?

Your test question cuts through the BS: “Can I explain to my board why no vendor can solve this?”

When I try to articulate it:

“We need SOC2 compliance” → But Roadie/Spotify Portal have SOC2
“We need audit trails” → Managed platforms have detailed logging
“We need data residency controls” → Most enterprise vendors offer regional hosting
“We need custom compliance reporting” → API integrations can handle this

The honest answer: Most of our “unique compliance needs” could be met by managed platforms + API integrations.

The real reason we were leaning toward self-hosting? “We’re technical people, we can build this, and we like building things.”

That’s ego, not strategy. Your framework calls this out directly.

The Opportunity Cost Revelation

Your Q4 (opportunity cost) is the most powerful question.

If our 6 platform engineers weren’t maintaining Backstage infrastructure:

Option A: Build Golden Paths for our unique financial services deployment patterns
Option B: Improve CI/CD pipeline (currently taking 45 min for full deploy)
Option C: Create compliance-as-code frameworks that reduce audit prep time

All three of these would have direct, measurable business impact. Maintaining React plugins and fighting Backstage upgrades? Not so much.

Question for You

How did you handle the conversation with engineers who were emotionally invested in the self-hosting decision?

In my experience, platform engineers often have strong opinions about building vs buying. They’re smart people who can build anything, so suggesting “let’s buy this” can feel like questioning their abilities.

How did you navigate that dynamic while maintaining team morale and psychological safety?

Applying This to Other Decisions

You mentioned “apply this framework to every build vs buy decision.” I’m thinking about other areas where we might be building when we should buy:

CI/CD pipelines: We’ve custom-built most of our Jenkins infrastructure
Observability stack: Cobbled together Prometheus, Grafana, custom dashboards
Secret management: Custom-built vault system

Running them through your 4 questions:

Differentiation? NO for all three
Long-term maintenance? Unclear (these systems are fragile)
Market alternatives? YES (GitHub Actions, Datadog, 1Password)
Opportunity cost? HIGH (engineers maintaining infrastructure)

This framework is revealing how much technical debt we’ve accumulated by defaulting to “build.”

Thanks for articulating this so clearly. It’s exactly what I needed to make the business case for change.

maya_builds · March 23, 2026, 8:18am

Speaking as someone whose startup failed partly because we built too much, this framework hits hard.

The Startup Lesson I Learned the Hard Way

We were a 15-person engineering team. We built:

Custom CMS when WordPress would’ve worked
Custom analytics when Mixpanel existed
Custom design system when Material UI was right there
Custom deployment pipeline when Heroku existed

Why did we build all of this? Because we were engineers, and engineers love building.

What we should have done: Buy/use everything, focus 100% on our actual product differentiation.

By the time we realized this, we’d spent 18 months building infrastructure instead of finding product-market fit. We ran out of runway.

Michelle’s Framework Applied to Startups

Let me run your 4 questions for a startup context:

Q1 (Differentiation): Does this create competitive advantage?

For startups: 95% of infrastructure decisions are NO
Build only what makes your product unique

Q2 (Long-term maintenance): Can we maintain it?

For startups: Almost always NO
You’ll pivot, people will leave, priorities will shift

Q3 (Market alternatives): 80% fit available?

For startups: Usually YES, and 80% is plenty
Perfect is the enemy of shipped

Q4 (Opportunity cost): What else could we build?

For startups: THE ACTUAL PRODUCT CUSTOMERS PAY FOR
This is the killer question we didn’t ask

The Size Threshold Question

Your framework has different answers at different company sizes. Here’s my mental model:

0-50 engineers: Buy everything. Seriously, everything. Your job is to find product-market fit, not maintain infrastructure.

50-200 engineers: Buy infrastructure layer (Backstage, observability, CI/CD). Build Golden Paths and unique integrations on top.

200+ engineers: Maybe consider DIY, but only if you pass all 4 questions AND have demonstrated platform engineering excellence (>60% adoption).

Does this align with your experience across different company sizes?

The Design Parallel

This framework applies to design systems too:

Most companies should use:

Tailwind CSS or Material UI (buy the foundation)
Custom components only for truly unique patterns (build the differentiation)

But I see teams spending months building custom button components, custom form libraries, custom everything. It’s the same ego-driven “we can build this” mentality.

Your insight: “Maintaining commodity infrastructure isn’t a competitive advantage.”

Design version: “Maintaining custom button components isn’t a competitive advantage.”

One Question

Your Exception 3: “Proven Excellence in Platform Engineering” - you say >60% voluntary adoption as the bar.

Where does that number come from? Is that based on benchmarks, or your experience, or industry data?

I’m curious because I want to apply similar thinking to design system adoption, and I’m wondering what “good” looks like.

Thanks for this framework - wish I’d had it 3 years ago when we were making build vs buy decisions at my startup.

product_david · March 23, 2026, 8:18am

As a product person, I’m fascinated by how this framework exposes the “build trap” that engineering orgs fall into.

The Product Management Parallel

In product management, we talk about the “build trap” - when teams focus on building features instead of creating customer value.

Your framework is identifying the engineering equivalent: Building infrastructure instead of building differentiation.

The Buy = Build Faster Insight

Here’s what I think many engineering leaders miss: Buying isn’t avoiding building. It’s building faster by focusing on what matters.

Your framework makes this explicit:

Buy: Commodity infrastructure (Backstage portal)
Build: Differentiation (Golden Paths unique to your org)
Result: You’ve built more differentiated capability, faster

This isn’t “we can’t build this.” It’s “we can build more value by not building this.”

The Build vs Buy Framing Problem

Your executive communication section is gold:

“Managed solutions don’t have all our features”
“Managed solutions provide 80% baseline, we’ll build the 20% that creates competitive advantage”

This reframes from loss (“don’t have”) to gain (“frees us to build”).

As a product person, I’d add another reframing:

“We need to build this ourselves”
“We need to own the user experience - managed platforms let us focus on that”

The Opportunity Cost as Product Strategy

Your Q4 (opportunity cost) is secretly a product prioritization framework.

Every engineering hour spent on commodity infrastructure is an hour NOT spent on:

Features customers will pay for
Performance improvements that reduce churn
Integrations that expand market reach
UX improvements that increase adoption

Product lens: Infrastructure investment must compete with feature investment. If infrastructure doesn’t create customer value, it’s not competing well.

The Metrics That Matter

From a product perspective, here’s how I’d measure build vs buy decisions:

Leading indicators (did we choose right?):

Time to value: How fast did we get utility?
Adoption rate: Are people using it?
Engineering velocity: Can we ship faster now?

Lagging indicators (was it worth it?):

Customer value created: New features enabled?
Revenue impact: ARR from freed engineering capacity?
Cost avoidance: What didn’t we have to build?

Your framework essentially asks: Does building this enable more product value than buying it?

For infrastructure, the answer is usually no.

One Question About the AI Factor

You mention platforms must support AI agents, and managed platforms are building this because customers demand it.

This raises an interesting product question: When market demand forces vendors to build capabilities, should you still build them yourself?

My product intuition: NO. When the market demands something (AI agent governance), vendors will build it because they have to. You should free-ride on that market pressure.

Framework addition: Ask “Is the market forcing vendors to solve this?” If yes, buying becomes even more attractive.

Applying This to Product Decisions

Your framework applies to product build vs buy too:

Should we build our own:

Payment processing? (No - Stripe exists)
Authentication? (No - Auth0/Okta exist)
Email delivery? (No - SendGrid exists)
Analytics? (No - Mixpanel/Amplitude exist)

All four questions point to BUY. Focus product eng on actual product differentiation.

Thanks for articulating this framework. I’m going to share it with our eng leadership - it’s exactly the conversation we need to have about technical strategy.

vp_eng_keisha · March 23, 2026, 8:18am

Michelle, this framework perfectly captures what I learned through our Backstage migration experience.

The Organizational Maturity Lens

Your 4 questions map beautifully to organizational maturity stages:

Stage 1: Startup (0-50 eng)

Q1 Differentiation: Almost nothing creates competitive advantage yet
Q2 Maintenance: Can’t maintain anything long-term (pivots, churn)
Q3 Alternatives: Everything has alternatives
Q4 Opportunity: Must focus on product-market fit
Conclusion: Buy everything

Stage 2: Scale-up (50-200 eng)

Q1 Differentiation: Starting to understand what makes you unique
Q2 Maintenance: Can maintain some things, but carefully choose
Q3 Alternatives: Buy commodity, build unique integrations
Q4 Opportunity: High - still need to ship features
Conclusion: Hybrid - buy foundation, build differentiation

Stage 3: Enterprise (200+ eng)

Q1 Differentiation: Clear competitive advantages identified
Q2 Maintenance: Dedicated teams with career paths
Q3 Alternatives: Some truly unique requirements possible
Q4 Opportunity: Still high, but more capacity
Conclusion: Selective build, but measure obsessively

We’re in Stage 2 (80 engineers), which is why managed Backstage + custom Golden Paths made sense.

The Adoption Metric Is Key

Your Exception 3 mentions >60% voluntary adoption as the bar for “proven excellence in platform engineering.”

This is brilliant because it adds an empirical gate: You must prove you can execute well before you earn the right to build more.

In our case:

Self-hosted Backstage: 15% adoption → FAILED the execution test
Post-migration: 60% adoption → PASSED, but with managed foundation

The lesson: We could build excellent Golden Paths (proven by 60% adoption), but we couldn’t maintain excellent infrastructure (proven by 15% adoption with self-hosted).

Framework addition: Separate “can we build unique experiences?” from “can we maintain commodity infrastructure?” Different capabilities require different decisions.

The Engineering Leader Challenge

Luis asked about managing engineers emotionally invested in building. This is real.

Here’s how I framed it with my team:

Bad framing: “We’re not capable of maintaining Backstage”
Good framing: “You’re too valuable to spend on infrastructure. I need you building Golden Paths that developers love.”

Repositioning from capability (implying they can’t do it) to priority (they’re too important to waste on this).

The team responded positively because:

It acknowledged their skills (“you’re too valuable”)
It clarified priority (“building experiences > maintaining infrastructure”)
It aligned with what they actually enjoy (building new capabilities vs firefighting plugins)

The Metrics That Changed My Mind

Before migration, I tracked:

Backstage uptime (technical metric)
Feature count (vanity metric)
Team satisfaction (lagging indicator)

After reading about product thinking, I changed to:

Voluntary adoption rate (leading indicator)
Time to first value for new users (efficiency metric)
Golden Path usage (value metric)

The shift: From “is the platform working?” to “are developers choosing it?”

Michelle’s Q4 (opportunity cost) forced this mental shift: What aren’t we building because we’re maintaining infrastructure?

Answer: Golden Paths that developers actually want.

One Addition to the Framework

I’d add a 5th question: Can we measure success, and are we willing to kill this if metrics don’t improve?

Too many platform teams build and never sunset. Your Exception 3 (>60% adoption) implies this, but I’d make it explicit:

Q5: Have we defined success metrics and committed to sunset if we don’t hit them?

For our self-hosted Backstage:

Target: 60% adoption within 12 months
Reality: 15% after 18 months
Decision: Migrate to managed because we failed our own success criteria

Setting exit criteria prevents sunk cost fallacy from keeping us locked into failed decisions.

Thank You

This framework would have saved us 12 months and $600K if I’d had it in 2024. Sharing it widely with my network.