Skip to main content

44 posts tagged with "product"

View all tags

When Accuracy Becomes a Liability: How Users Build Workflows Around Your AI's Failure Modes

· 10 min read
Tian Pan
Software Engineer

A team ships an AI feature at 70% accuracy. Eighteen months pass. Users adapt, complain at first, then settle in. They learn which prompt phrases avoid the edge cases. They know to double-check outputs involving dates. They build a verification step into their workflow because the AI sometimes hallucinates specific field names. Then the team ships a new model. Accuracy jumps to 85%. Support tickets spike. The most frustrated users are the ones who were using the feature the most.

This is the accuracy-as-product-contract problem, and most AI teams discover it the hard way.

AI Feature Payback: The ROI Model Your Finance Team Won't Fight You On

· 10 min read
Tian Pan
Software Engineer

Every engineering team shipping AI features eventually hits the same wall: finance wants a spreadsheet that justifies the spend, and the spreadsheet you built doesn't actually work.

The problem isn't that AI features lack ROI. The problem is that AI economics break every assumption the standard ROI model was built on — fixed capital, linear cost curves, predictable timelines. Teams that treat AI spending like SaaS licensing get numbers that either look deceptively good before launch or collapse six months into production. The ten-fold gap between measured AI initiatives (55% ROI) and ad-hoc deployments (5.9% ROI) comes almost entirely from whether teams got the measurement model right before they shipped.

The Data Flywheel Assumption: When AI Features Compound and When They Just Accumulate Noise

· 9 min read
Tian Pan
Software Engineer

Every AI pitch deck includes a slide about the data flywheel. The story is appealing: users interact with your AI feature, that interaction generates data, the data trains a better model, the better model attracts more users, and the cycle repeats. Scale long enough and you have an insurmountable competitive moat.

The problem is that most teams shipping AI features don't have a flywheel. They have a log file. A very large, expensive-to-store log file that has never improved their model and never will—because the three preconditions for a real flywheel are missing and nobody has asked whether they're present.

The First-Mover Disadvantage in AI: A Framework for Timing Your AI Feature Launch

· 10 min read
Tian Pan
Software Engineer

The conventional wisdom in tech—move fast, ship early, establish moats—turns lethal in AI at a particular moment in the model improvement curve. In 2023, dozens of teams built viable businesses around a single capability: let users upload a PDF and ask questions about it. Then OpenAI added native file upload to ChatGPT. The businesses didn't die because they were slow. They died because they were early.

This isn't an isolated incident. It's a structural feature of building on top of rapidly improving base models, and most launch timing frameworks were designed for slower-moving technology curves. The framework you used to decide when to ship a SaaS feature doesn't translate to AI—the inputs are different and the failure modes are entirely distinct.

The Accountability Transfer Problem: Why AI Gets Blamed for Decisions It Was Never Designed to Make Alone

· 10 min read
Tian Pan
Software Engineer

A major health insurer deployed an AI tool to evaluate post-acute care claims. The system had an error rate above 90% — meaning nine of every ten appealed denials were eventually overturned by human reviewers. Yet those denials weren't proactively corrected. Patients had to appeal, one by one. When the lawsuits came, the company's response was to point at the AI.

The AI denied nothing. Humans approved those denials at scale, embedded in a workflow they designed, in a system they chose to deploy. But "the AI decided" is a sentence that distributes blame in a direction that conveniently absolves the organization, the executives who approved the rollout, and the reviewers who signed off on each case.

This is the accountability transfer problem — and it's not a future risk. It's already endemic in production AI systems.

The First AI Feature Problem: Why What You Ship First Determines What Users Accept Next

· 9 min read
Tian Pan
Software Engineer

Most teams ship their boldest AI feature first. It's the one they've been working on for six months, the one that makes a good demo, the one that leadership is excited about. It fails in production — not catastrophically, just enough to make users uncomfortable — and suddenly every AI feature that follows inherits that skepticism. The team spends the next year wondering why adoption is flat even after they fixed the original problems.

This is the first AI feature problem. What you ship first establishes a precedent that persists long after the technical issues are resolved. User trust in AI is formed on the first failure, not the first success. The sequence of your launches matters more than the quality of any individual feature.

The Org Chart Problem: Why AI Features Die Between Teams

· 10 min read
Tian Pan
Software Engineer

The model works. The pipeline runs. The demo looks great. And then the feature dies somewhere between the data team's Slack channel and the product engineer's JIRA board.

This is the pattern behind most AI project failures—not a technical failure, but an organizational one. A 2025 survey found that 42% of companies abandoned most of their AI initiatives that year, up from 17% the year prior. The average sunk cost per abandoned initiative was $7.2 million. When the post-mortems get written, the causes listed are "poor data readiness," "unclear ownership," and "lack of governance"—which are three different ways of saying the same thing: nobody was actually responsible for shipping the feature.

Per-User AI Quotas: The UX Layer Your Cost Dashboard Can't See

· 10 min read
Tian Pan
Software Engineer

A user opens your AI feature at 3pm on a Tuesday. They've been using it lightly for three weeks. This time the request hangs for eight seconds and returns a red banner: "Something went wrong. Please try again later." They try again. Same banner. They close the tab and go back to whatever they were doing before — and they tell their teammate at standup the next morning that "the AI thing is broken."

What actually happened: they crossed an invisible per-user quota that your cost team set six months ago to keep a single power user from blowing through the GPU budget. The quota worked. Spend stayed flat. The dashboard is green. The feature is, by every metric your engineering org tracks, healthy. It's also dead, because the user who got that banner is never coming back, and the three teammates they told at standup will never try it.

This is the gap your cost dashboard cannot see. Per-user AI quotas are a product surface. The team that hides them inside an HTTP 429 is letting their cost-control system silently shape user perception of the product, and they will not find out until churn shows up in a quarterly review with no obvious cause.

Cold-Start Evaluation: How to Ship an AI Feature With Zero Production Traces

· 10 min read
Tian Pan
Software Engineer

Every AI feature launch has the same quiet moment before the first user sees it: someone on the team asks "how do we know this is good?" and the honest answer is "we don't, yet." You have no traces because you have no users. You have no users because you haven't shipped. The loop is real, and the two failure modes it produces are both fatal — ship blind and let the first week of escalations be your eval dataset, or wait for "real data" and watch the roadmap slide for a quarter while a competitor publishes a demo.

The way out is not to pretend cold-start evaluation is the same problem as post-launch evaluation with a smaller sample size. It isn't. You are not sampling a distribution; you are constructing a prior. Every day-1 signal is an artifact of a choice you made about what to measure, whose behavior to simulate, and which failures to care about. Teams that ship AI features well treat the pre-launch eval stack as a first-class deliverable — not a spreadsheet hacked together the night before the gate review, but a layered system of dogfooding, simulation, expert annotation, and adversarial probes, each contributing a different kind of signal and each weighted with an explicit story about what it can and cannot tell you.

The Demo Loop Bias: How Your Dev Process Quietly Optimizes for Impressive Failures

· 10 min read
Tian Pan
Software Engineer

There is a particular kind of meeting that happens at every AI-product team, usually on Thursdays. Someone shares their screen, drops a prompt into a notebook, and runs three or four examples. The room reacts. People say "wow." Someone takes a screenshot for Slack. A decision gets made — ship it, swap models, change the temperature. No one writes down the failure rate, because no one measured it.

This is the demo loop, and it has a structural bias that almost no team accounts for: it does not select for the best output. It selects for the most legible output. Over weeks and months, your prompt evolves to produce answers that land in a meeting — confident, fluent, well-formatted, on-topic. Whether they are correct is a separate variable, and it is one your process is not measuring.

The result is what I call charismatic failure: outputs that are wrong in ways your demo loop has been trained, by selection pressure, to ignore.

The AI Feature Nobody Uses: How Teams Ship Capabilities That Never Get Adopted

· 9 min read
Tian Pan
Software Engineer

A VP of Product at a mid-market project management company spent three quarters of her engineering team's roadmap building an AI assistant. Six months after launch, weekly active usage sat at 4%. When asked why they built it: "Our competitor announced one. Our board asked when we'd have ours." That's a panic decision dressed up as a product strategy — and it's endemic right now.

The 4% isn't an outlier. A customer success platform shipped AI-generated call summaries to 6% adoption after four months. A logistics SaaS added AI route optimization suggestions and got 11% click-through with a 2% action rate. An HR platform launched an AI policy Q&A bot that spiked for two weeks and flatlined at 3%. The pattern is consistent enough to name: ship an AI feature, watch it get ignored, quietly sunset it eighteen months later.

The default explanation is that the AI wasn't good enough. Sometimes that's true. More often, the model was fine — users just never found the feature at all.

The AI Feature Retirement Playbook: How to Sunset What Users Barely Adopted

· 11 min read
Tian Pan
Software Engineer

Your team shipped an AI-powered summarization feature six months ago. Adoption plateaued at 8% of users. The model calls cost $4,000 a month. The one engineer who built it has moved to a different team. And now the model provider is raising prices.

Every instinct says: kill it. But killing an AI feature turns out to be significantly harder than killing any other kind of feature — and most teams find this out the hard way, mid-retirement, when the compliance questions start arriving and the power users revolt.

This is the playbook that should exist before you ship the feature, but is most useful right now, when you're staring at usage graphs that point unmistakably toward the exit.