Over the past 18 months, we embarked on what seemed like a straightforward mission: unify our deployment pipelines. Three separate systems—one for application deployments, one for ML model deployments, and one for data pipeline orchestration. The exec team saw the Gartner report predicting 80% adoption of unified platforms by 2026 and said “let’s be in that 80%.”
We allocated $2M in budget. Brought in consultants. Kicked off a six-month initiative to build what everyone was calling “the unified platform.” And honestly? We delivered exactly what we promised. A beautiful Kubernetes-based internal developer platform with a Backstage frontend, GPU node pools for ML workloads, integration with our model registry, the works.
App developers loved it. Deployment times dropped from days to hours. They were thrilled. We declared victory at the all-hands. Showed impressive metrics. Got executive buy-in for the next phase.
Then we noticed something: our ML engineering team hadn’t migrated. Six months after launch, they were still deploying models directly to SageMaker, bypassing our shiny new platform entirely.
The Brutal Truth
We built an app-dev platform with ML features bolted on. Not a unified platform. We approached it from an infrastructure perspective—“let’s make Kubernetes support both workloads”—instead of a workflow perspective. We never asked the fundamental question: what does “deployment” actually mean to a data scientist versus a backend engineer?
For our app developers, deployment means: push code, run tests, build container, deploy to cluster, done. Linear workflow, clear success criteria.
For our ML engineers, deployment means: validate model performance, ensure feature parity with training data, configure inference endpoints, set up A/B testing infrastructure, establish monitoring for model drift, plan rollback strategy. It’s not linear—it’s iterative and experimental.
We built the first workflow and expected ML teams to adapt. They didn’t. They couldn’t. Their jobs are fundamentally different.
What We’re Doing Differently
Six months ago, we started over. This time with a co-design approach:
-
Embedded platform engineers in the ML team for a month. Just observed. Watched how they actually work.
-
Workflow mapping sessions where ML engineers walked us through their ideal deployment process, not the one our platform imposed.
-
Prototype testing with real ML workloads before building infrastructure. We validated workflows before infrastructure.
-
Separate abstractions for different personas that compile down to the same underlying platform. App devs see containers and deployments. ML engineers see model versions and inference endpoints. Same Kubernetes, different interfaces.
It’s slower. Much slower. We’re eight months in and still not at feature parity with what we “delivered” in the first attempt. But adoption is real this time. ML teams are actually using it because we built for their workflows, not ours.
The Lessons
If I could go back and advise the team that started this journey:
-
Start with workflows, not infrastructure. Your platform is only “unified” if it serves all workflows, not just one.
-
Co-design isn’t optional. Every persona needs representation from day one, not “we’ll add ML support in v2.”
-
Measure adoption, not features. We built everything on the roadmap but failed at the only metric that matters: are people using it?
-
Different doesn’t mean worse. ML deployment workflows aren’t broken app workflows. They’re legitimately different, and that’s okay.
-
Budget for iteration. The $2M we spent taught us what NOT to build. That has value, but only if you get budget to apply those lessons.
Right now we’re at 60% adoption across all personas. Not the 100% we optimistically projected, but it’s real adoption. ML engineers are choosing the platform because it works for them, not because we mandated it.
The unified delivery pipeline is coming. But unification at the infrastructure level without workflow-level empathy is just infrastructure consolidation with better PR.
For teams starting this journey: What’s your definition of “unified”? Because if it doesn’t include the workflows of your least-represented persona, you’re building the same expensive lesson we did.