The Centralized AI Platform Trap: Why Shared ML Teams Kill Product Velocity

April 12, 2026 · 8 min read

Software Engineer

Most engineering organizations discover the problem the same way: AI demos go well, leadership pushes for broader adoption, and someone decides the right answer is a dedicated team to own "AI infrastructure." The team gets headcount, a roadmap, and a mandate to accelerate AI across the organization.

Eighteen months later, product teams are filing tickets to get their prompts deployed. The platform team is overwhelmed. Features that took days to demo are taking quarters to ship. And the team originally created to speed up AI adoption has become its primary bottleneck.

This is the centralized AI platform trap — and it's surprisingly easy to fall into.

Why Centralization Feels Right (and Isn't)

The intuition behind centralizing AI infrastructure is sound at first glance. Shared infrastructure prevents duplication. Standardized tooling ensures consistency. A dedicated team can build the hard parts so product teams don't have to.

This is how platform engineering works for compute, CI/CD, and observability — and it works well there because those problems are genuinely infrastructure problems. A Kubernetes cluster is a Kubernetes cluster. A deployment pipeline is a deployment pipeline. The domain-specific knowledge lives in the application, not the platform.

Machine learning inverts this. The domain-specific knowledge — which features matter, which failure modes are acceptable, what latency budget the product can tolerate, which eval metric maps to business value — lives in the model and its operating context. Separating that knowledge from the team responsible for building the system creates a gap that tooling cannot close.

When you centralize AI development, you're not abstracting infrastructure. You're abstracting expertise. And that's where the trouble starts.

The Three Failure Modes

The ticket queue problem. Every product team's AI needs eventually flow through the central team's backlog. Need to fine-tune a model? Ticket. Need to change the retrieval strategy? Ticket. Need to add a tool to an agent? Ticket. The central team prioritizes across requests from every business unit, which means teams with urgent, domain-specific needs wait behind teams with different priorities. What feels like coordination becomes contention.

The organizational irony is that teams with the least ML expertise — the ones who most need help — are also the ones most harmed by this queue. Teams that understand ML well enough to self-serve route around the bottleneck. Teams that don't know enough to build without help are the ones filing the most tickets, and the ones waiting the longest.

The abstraction mismatch problem. Central platform teams build abstractions intended to work across all use cases. These abstractions solve the median problem reasonably well and the edge cases poorly. A product team building a real-time pricing agent has different latency requirements than a team running nightly document classification. A team doing medical information retrieval has different quality standards than a team generating marketing copy. When the platform decides on one model provider, one evaluation framework, one deployment pattern, teams with non-median requirements spend their time working around the abstraction rather than building their product.

What's worse: the workarounds accumulate into an unofficial shadow infrastructure that the platform team doesn't know about and can't support. The gap between what the platform claims to provide and what teams actually use becomes a maintenance burden for both sides.

The ownership gap problem. When a centralized team owns the model infrastructure and a product team owns the user experience, the seam between them becomes a responsibility void. Latency degrades at the integration boundary. Eval metrics don't map to product outcomes. Production incidents expose unclear escalation paths. Neither team has end-to-end visibility into what's happening, and root-cause analysis requires coordinating across team boundaries under pressure.

ML systems in particular need rapid iteration based on production feedback — watching error patterns, adjusting prompts, tuning retrieval — and that iteration is only fast when the team doing it owns the full stack. Cross-team coordination adds latency to every improvement cycle.

What Conway's Law Tells You

Conway's Law states that organizations produce systems that mirror their communication structure. For AI systems, this plays out precisely: a monolithic centralized ML team produces a monolithic, inflexible ML platform. Product-embedded ML teams produce systems that fit their product's actual requirements.

The practical implication is that the right way to produce fast, domain-aware AI systems is to organize teams around business domains. If you want an AI pricing system that responds quickly to product feedback, put an ML engineer in the pricing team. If you want an AI search system that understands your content's semantics, put an ML engineer in the search team.

The inverse is also true. If you want a consistent, auditable, cost-controlled AI infrastructure, you do need some centralization — but not of model development. You need it at the infrastructure layer, not the application layer.

What Actually Needs to be Centralized

The mistake isn't the idea of a platform team. The mistake is defining the platform too broadly.

There are real problems that benefit from centralization, and they're almost entirely infrastructure concerns:

Rate limiting and cost attribution. LLM usage is token-based, variable, and expensive. Per-second request limits don't capture the actual cost distribution. A central AI gateway that enforces per-team token budgets, attributes costs to teams and features, and surfaces burn rate data is genuinely valuable infrastructure — not because it controls what teams build, but because it gives everyone visibility into the true cost of their decisions.

Model access control and audit trails. Authentication, authorization, and audit logging for model API access are cross-cutting concerns that don't belong in every product team's codebase. A centralized gateway handles compliance requirements once rather than requiring every team to independently implement (and potentially get wrong) access controls.

Observability and telemetry. A single point that captures latency, cost, quality metrics, and error rates across all model calls gives you system-wide insight that distributed implementations can't provide. Debugging a latency spike or cost anomaly is much easier when the data is in one place.

Shared infrastructure primitives. GPU cluster access, CI/CD templates for model evaluation, golden-path deployment patterns — these are legitimate platform concerns. The platform's job is to make it fast for product teams to bootstrap a new AI feature, not to build the feature for them.

What should not be centralized: which model to use, how to design prompts, what retrieval strategy to deploy, how to evaluate outputs for a specific product context. These decisions require domain knowledge that lives in product teams.

The Patterns That Work

The organizations that ship AI features quickly share a common structure: embedded ML expertise with centralized infrastructure.

Netflix, Uber, Airbnb, and similar organizations don't have centralized AI development teams. They have platform teams that provide shared infrastructure and tooling, while product teams own their models and the decisions that shape them. The platform team measures success by adoption and developer velocity, not by how many models they've built. Product teams measure success by model quality and business outcomes, using infrastructure they don't have to build themselves.

This distinction matters. A platform team that exists to control AI development becomes a bottleneck. A platform team that exists to remove friction from distributed AI development enables velocity. Same headcount, completely different organizational outcome.

For smaller organizations, the right answer is often simpler: full-stack engineers who own both the model and the product surface, with thin shared infrastructure for the cross-cutting concerns. The overhead of maintaining clean team boundaries at small scale often exceeds the benefits.

The staffing pattern that consistently fails is the one that separates model development entirely from product development — where ML engineers sit in a central team while product engineers have no ML capability at all. This creates permanent dependency in both directions and degrades over time as the central team's backlog grows and product teams become increasingly frustrated with their inability to ship.

The Thin Platform Surface

If you're designing or redesigning your AI team structure, the question to ask is: what are the cross-cutting concerns that every team needs and that provide genuine value when done once rather than independently?

The list is shorter than it seems. Central governance of model access, cost attribution, and audit logging. Shared infrastructure for compute and deployment. Reference implementations and golden paths that teams can adopt or ignore based on their constraints.

The list does not include: choosing which models product teams use, building abstractions that prevent teams from calling model APIs directly, owning the eval framework for every product, or requiring central review before any model change ships to production.

The irony of most centralized AI platform efforts is that the teams building them know what good looks like — they've seen fast-moving organizations ship AI features — but the organizational pressure to "govern AI" and "prevent proliferation" expands the platform's scope until it collapses under its own weight.

Build the thin surface. Let product teams own their models. Measure the platform by how quickly teams can ship their first feature, not by how standardized the organization's AI stack looks on a slide.

The bottleneck you're trying to prevent is usually the one you're creating.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Centralized AI Platform Trap: Why Shared ML Teams Kill Product Velocity

Why Centralization Feels Right (and Isn't)

The Three Failure Modes

What Conway's Law Tells You

What Actually Needs to be Centralized

The Patterns That Work

The Thin Platform Surface

Recommended Reading

About Tian Pan

Why Centralization Feels Right (and Isn't)​

The Three Failure Modes​

What Conway's Law Tells You​

What Actually Needs to be Centralized​

The Patterns That Work​

The Thin Platform Surface​

Recommended Reading

About Tian Pan

Why Centralization Feels Right (and Isn't)

The Three Failure Modes

What Conway's Law Tells You

What Actually Needs to be Centralized

The Patterns That Work

The Thin Platform Surface