Skip to main content

The Two-PM Problem: When Prompt Ownership and Product Ownership Drift Apart

· 11 min read
Tian Pan
Software Engineer

A support ticket lands on Tuesday morning: a customer was given a confidently wrong answer about their refund window. Engineering pulls the trace and finds the model picked the wrong intent. The product PM looks at the dashboard and sees the new "express refund" affordance — shipped last sprint — surfaced an intent the prompt was never tuned to handle. The platform PM points at the eval suite, which is green. Both are technically right. The customer is still wrong.

This is the two-PM problem, and most AI teams have it without naming it. The product PM owns the user-facing surface — intents, success metrics, the support escalation path. The platform or ML PM owns the prompt, the model choice, the eval suite, and the cost ceiling. The roadmaps are coordinated at the quarterly-planning level and drift at the weekly-shipping level, because the two PMs are optimizing for different metrics on different dashboards with different change-control processes.

The interesting failure mode isn't that the two PMs disagree. It's that they ship correctly relative to their own scope and still produce a regression nobody owns.

Why this split exists in the first place

Splitting prompt-ownership from product-ownership isn't an accident or a sign of immaturity. It usually reflects a real org constraint: the product PM came from the existing SaaS surface and knows the customer; the platform PM was hired (or grew) to own the model bill, the eval suite, and the integration with the inference provider. Neither role is wrong. The split solves the staffing problem of "we don't have one person who can do both."

It also matches how the rest of the industry talks about these roles. The market has converged on a separation between an AI product manager — who owns vision, customer fit, go-to-market — and an AI product owner or ML PM — who owns execution against models, prompts, and evals. Job ladders, hiring panels, and compensation bands have all calcified around this split.

The problem is that the artifact at the boundary — the prompt, plus its eval suite — isn't a backend implementation detail. It is the product surface. And the change-control process that the org has for backend implementation details is the wrong one for a user-experience artifact.

The four artifacts that get edited under both names

The boundary collapses around four specific artifacts. Each one has a product-shaped meaning and a platform-shaped meaning, and most teams have no convention for which side wins when the two read it differently.

The system prompt is platform code from the platform PM's view (a string in a config file, gated by an eval suite) and is brand voice from the product PM's view (the AI's tone, what it refuses, how it handles ambiguity). When the platform PM tunes "be more concise" to cut output tokens, the product PM finds out from a customer who says the AI used to be friendlier.

The eval set is a regression detector from the platform side and a definition of "what good looks like" from the product side. The product PM rarely sees the eval set as something they can edit; the platform PM rarely sees it as something they have to socialize. So the eval set encodes one team's assumptions about quality and gates the other team's releases against them.

The tool description — the docstring or schema the model reads to decide whether to call a function — is API documentation from the platform side and a behavioral contract from the product side. Editing "creates a draft" to "creates a draft (only when the user has explicitly confirmed intent)" looks like a clarification; it is actually a behavior change in how often the tool fires.

The help-center copy and in-product onboarding is product copy that the product PM owns directly, and it sets the user's mental model of what the AI can do. When that copy promises capabilities the prompt was never tuned for, the support team eats the gap.

These four artifacts get edited under both names, on different cadences, without a shared review process. That's the coordination tax.

How the regressions actually show up

The pattern that recurs in retrospectives looks roughly like this:

The product PM ships a UX change that surfaces a new intent — a button, a suggested prompt, an empty-state nudge. The eval suite has no coverage for that intent because it didn't exist last quarter. Production traffic shifts within hours. The model handles the new intent badly. The dashboard the platform PM watches is green because the eval suite is unchanged. The dashboard the product PM watches shows a CSAT dip the team attributes to a UX issue, then a copy issue, then — three weeks later — to the prompt.

The mirror failure also happens. The platform PM swaps the model to the new cheaper variant, runs the eval suite, ships green. The model is technically more concise but loses a stylistic register the product PM had quietly trained users to expect. The product PM hears about it from a customer escalation that comes through marketing because the customer used the word "tone" and not the word "regression."

Both regressions are invisible to the side of the org that doesn't own the artifact. Both are obvious in hindsight. The question is what discipline catches them in foresight.

A shared release calendar, not a shared roadmap

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates