Skip to main content

11 posts tagged with "ai-product"

View all tags

The 70% Reliability Uncanny Valley: Where AI Features Go to Lose User Trust

· 12 min read
Tian Pan
Software Engineer

A feature that fails 70% of the time is harmless. The user learns within a week that they have to verify every output, treats the system as an unreliable assistant, and adjusts. A feature that succeeds 70% of the time is worse than that. It is right often enough that the user stops verifying, and wrong often enough that the failures are concentrated, visible, and personal. The user's mental model collapses into "I cannot tell when to trust this" — which, as a product experience, is strictly worse than "I know not to trust this."

This is the 70% uncanny valley, and it is where most AI features built in the last two years live. The team measures aggregate accuracy, watches the number cross some "good enough" threshold, and ships. The realized user experience does not improve monotonically with that number. Between roughly 60% and 85% accuracy, the product gets worse as it gets more accurate, because the cost of a wrong answer the user did not think to check exceeds the value of a right answer they no longer have to verify.

The team that ships at 70% without designing for the predictability problem is not shipping a worse version of a 95% product. They are shipping a different product entirely: one whose primary failure mode is silent.

The Two-PM Problem: When Prompt Ownership and Product Ownership Drift Apart

· 10 min read
Tian Pan
Software Engineer

A support ticket lands on Tuesday morning: a customer was given a confidently wrong answer about their refund window. Engineering pulls the trace and finds the model picked the wrong intent. The product PM looks at the dashboard and sees the new "express refund" affordance — shipped last sprint — surfaced an intent the prompt was never tuned to handle. The platform PM points at the eval suite, which is green. Both are technically right. The customer is still wrong.

This is the two-PM problem, and most AI teams have it without naming it. The product PM owns the user-facing surface — intents, success metrics, the support escalation path. The platform or ML PM owns the prompt, the model choice, the eval suite, and the cost ceiling. The roadmaps are coordinated at the quarterly-planning level and drift at the weekly-shipping level, because the two PMs are optimizing for different metrics on different dashboards with different change-control processes.

The interesting failure mode isn't that the two PMs disagree. It's that they ship correctly relative to their own scope and still produce a regression nobody owns.

The AI Feature You Should Not Have Shipped: A Task-Shape Checklist

· 10 min read
Tian Pan
Software Engineer

The demo always works. That is the most expensive sentence in AI product development. The product manager sees the model handle the happy path, the engineer ships the obvious version of the feature, and six weeks later the support queue is full of complaints that the metric did not predict. Nothing in the model regressed. Nothing in the prompt got worse. The feature was simply not the shape the model could do well, and the team did not have a way to say so before the work began.

A meaningful fraction of shipped AI features fail this way — not because the model is bad, but because the task is wrong. The output the product needs is deterministic and the engine is stochastic. The user's tolerance for the tail is one bad answer per thousand and the model's failure distribution is heavier than that. The latency budget the unit economics require is half of what the model can deliver at any tier you can afford. The ground truth required to evaluate quality does not exist and cannot be cheaply created. None of these are model problems. They are task-shape problems, and they should have been screened before the first prompt was written.

The AI Off-Switch That Doesn't Exist: Retiring Features After Users Co-Author the Archive

· 11 min read
Tian Pan
Software Engineer

Six months after you launched the AI writing assistant, you open the analytics dashboard and find the metric you wanted: 40% of user-generated documents on the platform now contain AI-authored prose. The board meeting calls this engagement lift. Three weeks later, the model provider raises prices, the unit economics flip, and someone asks the obvious question: can we turn it off? You go looking for the toggle and discover that it isn't a toggle. It's a migration with product, legal, and UX surfaces attached, and pulling it cleanly will take two quarters and burn political capital with three teams who didn't know they were stakeholders.

This is the part of the AI product lifecycle that nobody planned for. The launch playbook covered prompt engineering, rate limits, eval harnesses, and a kill switch for runaway costs. It did not cover what happens when users have spent half a year producing artifacts that only exist because the generator existed, and now the read path through your archive depends on a feature you want to retire. The "off switch" was conceptual: a flag in a config file. The actual decommissioning is a coordinated set of decisions about grandfathering, versioning, content provenance, and the uncomfortable conversation about whether the engagement lift was ever value or just dependency.

The Missing Arm: Your AI Experiment Has No 'AI-Off' Control

· 9 min read
Tian Pan
Software Engineer

Look at the last six experiment readouts your team shipped on an AI feature. What were the arms? Odds are good you tested "new prompt vs. old prompt," or "GPT-5 router vs. GPT-4 fallback," or "reasoning model vs. fast model," or "with retrieval vs. without retrieval." You reported lift on engagement, task completion, or session length. You called it product impact. A quarter rolled by. Inference spend climbed. Nobody paused to ask the question the CFO eventually will: what would have happened if the feature simply weren't there?

That question is the missing arm. The lift your experiments keep measuring is "better AI vs. worse AI," but the one your business runs on is "AI vs. nothing" — or more uncomfortably, "AI vs. the three-line heuristic we never wrote down." These are different experiments with different conclusions, and most AI product programs in 2026 have only ever run the first one. The second is the one that tells you whether the feature is earning its inference bill.

AI User Research: What Users Actually Need Before You Write the First Prompt

· 10 min read
Tian Pan
Software Engineer

Most teams decide they're building an AI feature, then ask users: "Would you want this?" Users say yes. The feature ships. Three months later, weekly active usage is at 12% and plateauing. The postmortem blames implementation or adoption, but the real failure happened before a single line of code was written — in the user research phase that felt thorough but was methodologically broken.

The core problem: users cannot accurately predict their preferences for capabilities they have never experienced. This isn't a minor wrinkle. A study on AI writing assistance found that systems designed from users' stated preferences achieved only 57.7% accuracy — actually underperforming naive baselines that ignored user-stated preferences entirely. You can do a user research sprint that runs for weeks, collect extensive qualitative feedback, and end up with a product nobody uses — not despite the research, but partly because of how it was conducted.

The AI Capability Ratchet: How One Smart Feature Breaks Your Entire Product

· 10 min read
Tian Pan
Software Engineer

Your AI-powered search just shipped. It's fast, conversational, and handles nuanced queries in ways your old keyword search never could. The feature review was glowing. The launch post got shared. And then, two weeks later, the support tickets start — not about search, but about the customer support widget, the help documentation, and the notification center. Nobody changed any of those things. But users are suddenly furious.

Welcome to the AI capability ratchet. The moment you ship one demonstrably intelligent feature, you have permanently recalibrated what users consider acceptable across your entire product. The ratchet clicks up. It does not click back down.

This pattern is one of the least-discussed failure modes in AI product development. Teams celebrate individual feature launches without accounting for the expectation debt they are distributing to every team that didn't ship anything.

AI Feature Decommissioning Forensics: What Dead Features Teach That Successful Ones Cannot

· 10 min read
Tian Pan
Software Engineer

Here's an uncomfortable pattern: the AI feature your team is about to launch next quarter already died at your company two years ago. It shipped under a different name, with a different prompt, solving a vaguely different problem, and it got quietly decommissioned after six months of flat adoption. Nobody wrote it up. Nobody connected the dots. The leading indicators that would have saved this cycle were sitting in dashboards that got archived along with the feature.

Most engineering orgs are elaborate machines for remembering successes. Launches get retrospectives, blog posts, internal celebrations. The features that got killed — the ones with 12% weekly active users despite a polished demo, the ones whose unit economics inverted when token costs compounded across a longer-than-expected tool chain, the ones users learned to trust, lost trust in, and then routed around — generate almost no institutional memory. And the failure patterns embedded in those deaths are exactly the ones your planning process has no way to price in.

Trust Transfer in AI Products: Why the Same Feature Ships at One Company and Dies at Another

· 9 min read
Tian Pan
Software Engineer

Two product teams at two different companies build the same AI writing assistant. Same model. Similar feature surface. Comparable accuracy numbers. One team celebrates record activation at launch. The other quietly disables the feature after three months of ignored adoption and one scathing internal all-hands question.

The engineering debrief at the struggling company focuses on the obvious variables: latency, accuracy, UX polish. None of them fully explain the gap. The real variable was trust — specifically, whether the AI feature could borrow enough existing trust to earn the right to make mistakes while it proved itself.

Trust transfer is the invisible force that determines whether an AI feature lands or dies. And most teams shipping AI products have never explicitly designed for it.

The AI Feature Kill Decision: When Metrics Say Yes but Users Say No

· 10 min read
Tian Pan
Software Engineer

Forty-two percent of companies abandoned most of their AI initiatives in 2025, up from 17% a year earlier. The striking part isn't the abandonment rate — it's the delay. Most of those projects had been in various stages of "almost ready" for six to twelve months before someone finally pulled the plug. The demo worked. The metrics looked plausible. The team was invested. And so the feature lingered, burning budget and credibility, long after the evidence pointed toward shutdown.

The hardest product decision in AI isn't what to build. It's when to stop building something that technically works but practically doesn't.

The AI Feature Kill Decision: When to Shut Down What Metrics Say Is Working

· 10 min read
Tian Pan
Software Engineer

Your AI feature has 12,000 monthly active users. Engagement charts slope upward. The demo still impresses stakeholders every quarter. And your users are quietly routing around it.

This is the kill decision that product teams avoid for months — sometimes years — because every surface-level metric says the feature is working. The dashboard shows adoption. What it doesn't show is the support engineer who manually corrects every third AI-generated summary before forwarding it to the customer, or the power user who learned that clicking "regenerate" three times produces acceptable output and has silently accepted that tax on their workflow.