I’ve been wrestling with a decision that keeps me up at night, and I’d love to get perspectives from folks who’ve been through this.
We’re a high-growth EdTech startup with about 80 engineers across 8 product teams. We’ve grown from 25 engineers to 80 in the past 18 months, and the cracks are starting to show. Different teams have built their own deployment pipelines, observability is fragmented, onboarding new engineers takes 2-3 weeks, and our incident response feels like we’re reinventing the wheel every time.
Everyone’s telling me we need a platform team. The VPE Slack groups say it. The conference talks say it. That Gartner report about 80% adoption says it. Even our board members are asking about it.
But here’s my problem: Every platform engineering case study I read talks about 6-12 month timelines for simple implementations, 12-18 months for complex ones. In a startup moving as fast as we are, 18 months might as well be a lifetime.
The Timing Paradox
Last quarter, we pivoted our entire product strategy based on customer feedback. The quarter before that, we entered a completely new market segment. In 18 months, we might be a totally different company. How do you justify building infrastructure for a future that’s fundamentally unknowable?
But the counterargument keeps nagging at me: Without platform investment, how do we scale from 80 to 150 engineers? How do we maintain velocity when every team is spending 30% of their time on undifferentiated infrastructure work?
What I’m Seeing Right Now
The pain points are real:
- New engineers spend their first two weeks just understanding our deployment processes (which differ by team)
- We had three separate incidents last month that could have been prevented with better observability
- Two of our best senior engineers left because they were “tired of fighting infrastructure fires instead of building product”
- Product teams are asking for features we can’t deliver because our infrastructure can’t support them
But the opportunity cost is also real:
- Standing up a platform team means 6-8 senior engineers not building product features
- That’s probably $2M in comp, plus another $1M in tooling and infrastructure
- Our runway is 24 months - do we bet $3M on infrastructure that won’t deliver value for 12-18 months?
- Meanwhile, competitors are shipping features we’re not
The Question I Can’t Answer
At what point does investing in platform engineering shift from “smart infrastructure bet” to “premature optimization that slows us down at exactly the wrong time”?
I’ve seen the DORA reports showing that 35% of platform teams deliver measurable value within 6 months. That’s encouraging. But it also means 65% take longer than 6 months. And 40% can’t demonstrate value in the first year at all.
Those aren’t odds I’d accept for a product bet. Why should infrastructure be different?
What Keeps Me Up at Night
I worry that we’re already late. That we should have started this 6 months ago when we hit 50 engineers. That by the time we acknowledge we need this, it’s already too late to do it right.
But I also worry that we’re following a trend because everyone else is doing it, not because it’s the right move for our specific situation at our specific stage.
What I Need From This Community
For those who’ve built platform teams:
- What signals told you it was the right time?
- How did you handle the opportunity cost during the build phase?
- What would you do differently about timing?
- How did you deliver incremental value before the “big platform” was done?
For those who waited:
- How did you know when to pull the trigger?
- What was the cost of waiting?
- Would you have moved earlier in hindsight?
For those who moved too early:
- How did you know you jumped the gun?
- What was the impact on product velocity?
- How did you course-correct?
I know this isn’t a simple “yes build it” or “no don’t” question. But I’m hoping some of you have frameworks, heuristics, or hard-won lessons that can help me think through this more clearly.
Because right now, I feel like I’m choosing between two bad options: Move too fast and accumulate crushing technical debt, or slow down to build infrastructure and lose our competitive window.
Is there a third option I’m not seeing?