Skip to main content

16 posts tagged with "product-management"

View all tags

The Prompt as Documentation: When the System Prompt Becomes the Only Artifact Anyone Trusts

· 10 min read
Tian Pan
Software Engineer

A product manager pings you in Slack asking what happens when a customer asks the assistant to cancel their subscription. You start typing the answer from memory, then second-guess yourself, then open the system prompt and read it for thirty seconds. You paste back a summary. They thank you and move on. Three hours later, support asks the same question. By Thursday, the head of partnerships pastes a screenshot of the prompt into a deal review.

This is the prompt-as-documentation anti-pattern, and the first time you notice it happening, it feels great. The artifact you spent six weeks tuning is now the canonical source of truth for what the product does. PMs are reading it. Support is reading it. Sales is reading it. Somewhere a designer is reading it. Your work is load-bearing in a way the old service-layer code never was, and you can prove it by counting the number of unrelated people who can pull the file from memory.

Your PRD Is an Untested Prompt — Until You Eval It

· 9 min read
Tian Pan
Software Engineer

Open the system prompt of any AI feature that shipped in the last six months and read it side by side with the PRD that authorized it. You will find two documents arguing with each other. The PRD says "the assistant should be helpful but professional, avoid making things up, and gracefully decline if it can't answer." The system prompt says "You are an AI assistant. Be concise. If you are unsure, say 'I don't know.' Never invent facts." The PRD takes a page. The prompt takes nine lines. The gap between them is where every behavioral bug you shipped this quarter lives.

The convenient fiction is that the prompt is an "implementation detail" of the PRD. The actual relationship is the opposite. The prompt is the contract the model executes; the PRD is a draft of that contract written in a language the model does not speak, by an author who never compiled it. Every PRD for an AI feature is an untested prompt. The team that admits this and runs the PRD through an eval before sign-off ships a feature with one fewer source of post-launch surprise.

The Agentic Stamp: When Marketing Names It and Engineering Pays the Operational Bill

· 10 min read
Tian Pan
Software Engineer

A product marketing manager writes "AI agent" in a launch brief. The press release goes out describing autonomous decision-making. Six weeks later, engineering is staring at a Jira board full of "agent observability" tickets they never scoped for a system that is, in fact, a single prompt followed by a hardcoded tool dispatch. Nobody lied. Nobody made a technical error. The team just learned that the word "agent" is not a description — it is a stamp, and the stamp carries operational implications that engineering inherits whether or not the implementation justifies them.

This is the internal version of what Gartner now calls "agent washing." The external version — vendors rebranding chatbots as agents to ride the hype cycle — gets the press coverage. The internal version is quieter and more expensive, because the bill falls on people who can't push back at the moment the term gets approved.

Your Eval Suite Is the Product Spec You Refused to Write

· 10 min read
Tian Pan
Software Engineer

Open the PRD for any AI feature shipping this quarter. Notice the adjectives. The assistant should be helpful. Responses should feel natural. The agent should understand the user's intent. The summary should be accurate and concise. Every one of these words is a place the team gave up. They did not decide what the feature does. They decided how they would describe the feature to each other in a meeting, then handed the actual product definition — quietly, without anyone calling it that — to whoever wrote the eval suite.

This is not a documentation problem. The eval is the spec. The PRD is a press release written before the product exists. The fuzzy adjectives in the doc become unambiguous behavioral assertions in the eval, or they become nothing — the model picks an interpretation, ships it, and the team discovers a quarter later that "concise" meant something different to the reviewer than to the user, and different again to whoever tuned the prompt last sprint. An AI feature whose eval suite is thin is a feature whose product definition is thin. The model didn't fail. The team never decided what success meant.

Why Rolling Back an AI Feature Is Harder Than Rolling Back Code

· 9 min read
Tian Pan
Software Engineer

When a personality update made a popular AI assistant noticeably more flattering and complimentary, the engineering team quickly identified the problem and issued a rollback within days. The code change was clean. The model swap was straightforward. And users were furious anyway — not because the rollback was broken, but because some of them had already built workflows around the sycophantic version. Their prompt strategies, their review loops, their interpretation of the model's confidence signals — all of it had been tuned to an AI they no longer had access to.

Rolling back the code had taken hours. Rolling back the users was impossible.

This asymmetry is the central challenge of AI feature management that most engineering teams underestimate until they've been burned by it. Conventional rollback thinking treats "undo" as a purely technical operation. For AI features, that's only half the story.

The PRD for an AI Feature: Why Your Old Template Misses the Cliff

· 10 min read
Tian Pan
Software Engineer

The deterministic-software PRD template has aged into a kind of muscle memory. Problem statement, user stories, acceptance criteria, edge cases, success metrics, scope cuts. Engineers know how to read it. PMs know how to fill it in. Designers know which sections to lift wireframes from. It is a well-worn artifact that has shipped a generation of CRUD apps, dashboards, and SaaS workflows.

It also has no field for "what the model gets wrong five percent of the time." No field for "what we accept as a passing eval score." No field for "what the user sees when the model refuses to answer." No field for "which prompt version this PRD locks down, and who is allowed to change it after ship." Every AI feature shipped against that template is shipping with a hidden contract that nobody wrote down. Postmortems keep finding it the hard way.

Why Your AI Roadmap Shouldn't Have a 12-Month Plan

· 9 min read
Tian Pan
Software Engineer

A team I worked with last quarter spent six weeks building a "smart document classifier" — fine-tuned model, eval harness, custom UI, the whole production pipeline. It shipped on a Tuesday. The following Monday, a new general-purpose model dropped that beat their fine-tune on the same eval, zero-shot, with no infrastructure investment. Their entire Q2 OKR became a wrapper around a one-line API call. The roadmap had committed twelve months earlier to "owning the classification stack." That commitment was wrong before the ink dried.

This is not an isolated story. Industry trackers logged 255 model releases from major labs in Q1 2026 alone, with roughly three meaningful frontier launches per week through March. Costs have collapsed: API pricing is down 97% since GPT-3, and the gap between top providers has narrowed to within statistical noise on most benchmarks. When the underlying substrate changes this fast, a twelve-month feature roadmap is not a plan — it is a list of bets you cannot revisit, made with information that will be stale before you ship the second item.

Retiring an AI Feature Is a Trust Event, Not a Deprecation

· 13 min read
Tian Pan
Software Engineer

The metrics tell you to kill it. Three percent of monthly actives. The eval refresh has slipped two cycles. The prompt has a // TODO: revisit when we move off the legacy ticket schema from a year ago. Your senior AI engineer spends a full week per month babysitting the thing — model upgrades, label drift, the one tool integration that flakes whenever the upstream API changes its date format. Every quarterly review, somebody asks why this assistant still exists, and every quarter the answer is "we haven't gotten to it yet."

So you write the deprecation memo. You copy the structure from the API sunset playbook your platform team perfected: T-minus-six-months announcement, a migration guide, a banner in the product, a webhook for partners, the usual Sunset: HTTP header. You ship it on a Tuesday. By Thursday afternoon, your CSMs are forwarding emails that don't sound like API deprecation complaints. They sound like breakup letters.

That's the moment most teams realize they took a category error to production. The thing you're retiring isn't an API. It's a relationship the user formed with something that talked back.

The AI Feature OKR Mismatch: Why Quarterly Cadence Breaks AI Roadmaps

· 10 min read
Tian Pan
Software Engineer

The team commits to "ship the AI summarizer this quarter," gets it past the technical bar by week ten, takes a victory lap at the all-hands, and ships. Six weeks later the telemetry curve starts bending the wrong way — quietly, slowly, in a way nobody dashboards because nobody owns the shape. The OKR is already marked green. The next quarter's OKRs are already drafted around new launches. The summarizer is now somebody's second-priority maintenance job, and by quarter-end review the team is wondering why customer satisfaction on the feature dropped fifteen points without anything obvious changing.

This is not a bug in the team. It's a bug in the operating model. Quarterly OKRs were calibrated for software where a feature can be scoped, built, shipped, and then largely left alone until the next major rev. AI features don't have that shape. They have a launch curve and a sustain curve, and the sustain curve is where most of the value — and most of the risk — actually lives. The OKR template that treats them as deliverables with launch dates quietly produces a portfolio of demos that decay before the next planning cycle.

Your Eval Rubric Is the Real Product Spec — and No PM Signed Off on It

· 11 min read
Tian Pan
Software Engineer

A product manager writes a paragraph: "The assistant should be helpful, accurate, and concise, and should never make the customer feel rushed." An engineer reads that paragraph, opens a YAML file, and writes 47 weighted criteria so the LLM-as-judge can produce a number on every trace. Six months later, that YAML file is the actual specification of the product. Every release is gated on it. Every regression alert fires on it. Every "this is shipping quality" decision routes through it. The PM has never read it.

This is the most common form of unintentional product ownership transfer in AI engineering today. The rubric is not a measurement of the spec — it is the spec, in the same way that a compiler is not a description of your language but the operational truth of it. And like compilers, rubrics have implementation details that silently determine semantics. Which failure mode gets a 0 versus a 0.5? Which criteria is weighted 0.3 versus 0.05? Which behavior is absent from the rubric and therefore goes uncounted entirely? Each of these is a product decision. None of them lived in the original brief.

Found Capabilities: When Users Ship Features Your Team Never Roadmapped

· 10 min read
Tian Pan
Software Engineer

A customer emails support to ask why your CRM agent stopped drafting their NDAs. You did not know your CRM agent drafted NDAs. A power user complains that your support bot's Tagalog translations have gotten worse since last week. You did not know your support bot did Tagalog. A forum thread spreads a prompt that turns your code-review assistant into a passable security scanner, and within a quarter you are getting CVE reports filed against findings the assistant produced. Each of these is a feature with adoption, business impact, and zero institutional ownership — no eval, no SLA, no surface in the UX, no roadmap entry, and a quiet bus factor of one: the customer who figured it out.

This is what happens once your product is wrapped around a model whose capability surface is wider than the surface you scoped. Users explore the wider surface, find behaviors that solve their problems, build workflows on top of those behaviors, and then experience your next model upgrade as a regression even though nothing on your roadmap moved. The contract between you and your users is no longer the one you wrote down. It includes everything the model happened to do for them that you happened not to break.

Treating this as an engineering surprise — "we will harden the prompt, we will add a guardrail, we will catch it next time" — is a category error. Found capabilities are a product-management problem. The discipline is not preventing them; it is detecting them, deciding what to do with them, and remembering that you decided.

The AI Feature Sunset Playbook: How to Retire Underperforming AI Without Burning Trust

· 10 min read
Tian Pan
Software Engineer

Engineering teams have built more AI features in the past three years than in the prior decade. They have retired almost none of them. Deloitte found that 42% of companies abandoned at least one AI initiative in 2025 — up from 17% the year before — with average sunk costs of $7.2 million per abandoned project. Yet the features that stay in production often cause more damage than the ones that get cut: they erode user trust slowly, accumulate technical debt that compounds monthly, and consume engineering capacity that could go toward things that work.

The asymmetry is structural. AI feature launches generate announcements, stakeholder excitement, and team recognition. Retirements are treated as admissions of failure. So bad features accumulate. The fix is not willpower — it is a decision framework that makes retirement a normal, predictable engineering outcome rather than an organizational crisis.