Skip to main content

34 posts tagged with "product"

View all tags

The AI Feature Nobody Uses: How Teams Ship Capabilities That Never Get Adopted

· 9 min read
Tian Pan
Software Engineer

A VP of Product at a mid-market project management company spent three quarters of her engineering team's roadmap building an AI assistant. Six months after launch, weekly active usage sat at 4%. When asked why they built it: "Our competitor announced one. Our board asked when we'd have ours." That's a panic decision dressed up as a product strategy — and it's endemic right now.

The 4% isn't an outlier. A customer success platform shipped AI-generated call summaries to 6% adoption after four months. A logistics SaaS added AI route optimization suggestions and got 11% click-through with a 2% action rate. An HR platform launched an AI policy Q&A bot that spiked for two weeks and flatlined at 3%. The pattern is consistent enough to name: ship an AI feature, watch it get ignored, quietly sunset it eighteen months later.

The default explanation is that the AI wasn't good enough. Sometimes that's true. More often, the model was fine — users just never found the feature at all.

The AI Feature Retirement Playbook: How to Sunset What Users Barely Adopted

· 11 min read
Tian Pan
Software Engineer

Your team shipped an AI-powered summarization feature six months ago. Adoption plateaued at 8% of users. The model calls cost $4,000 a month. The one engineer who built it has moved to a different team. And now the model provider is raising prices.

Every instinct says: kill it. But killing an AI feature turns out to be significantly harder than killing any other kind of feature — and most teams find this out the hard way, mid-retirement, when the compliance questions start arriving and the power users revolt.

This is the playbook that should exist before you ship the feature, but is most useful right now, when you're staring at usage graphs that point unmistakably toward the exit.

The AI Taste Problem: Measuring Quality When There's No Ground Truth

· 11 min read
Tian Pan
Software Engineer

Here's a scenario that plays out at most AI product teams: someone on leadership asks whether the new copywriting model is better than the old one. The team runs their eval suite, accuracy numbers look good, and they ship. Three weeks later, the marketing team quietly goes back to using the old model because the new one "sounds off." The accuracy metrics were real. They just measured the wrong thing.

This is the AI taste problem. It shows up wherever your outputs are subjective — copywriting, design suggestions, creative content, tone adjustments, style recommendations. When there's no objective ground truth, traditional ML evaluation frameworks give you a false sense of confidence. And most teams don't have a systematic answer for what to do instead.

The Conversation Designer's Hidden Role in AI Product Quality

· 10 min read
Tian Pan
Software Engineer

Most engineering teams treat system prompts as configuration files — technical strings to be iterated on quickly, stored in environment variables, and deployed with the same ceremony as changing a timeout value. The system prompt gets an inline comment. The error messages get none. The capability disclosure is whatever the PM typed into the Notion doc on launch day.

This is the root cause of an entire class of AI product failures that don't show up in your eval suite. The model answers the question. The latency is fine. The JSON validates. But users stop trusting the product after three sessions, and the weekly active usage curve never recovers.

The missing discipline is conversation design. And it shapes output quality in ways that most engineering instrumentation is architecturally blind to.

The Cold Start Problem in AI Personalization: Being Useful Before You Have Data

· 11 min read
Tian Pan
Software Engineer

Most personalization systems are built around a flywheel: users interact, you learn their preferences, you show better recommendations, they interact more. The flywheel spins faster as data accumulates. The problem is the flywheel needs velocity to generate lift — and a new user has none.

This is the cold start problem. And it's more dangerous than most teams recognize when they first ship personalization. A new user arrives with no history, no signal, and often a skeptical prior: "AI doesn't know me." You have roughly 5–15 minutes to prove otherwise before they form an opinion that determines whether they'll stay long enough to generate the data that would let you actually help them. Up to 75% of new users abandon products in the first week if that window goes badly.

The cold start problem isn't a data problem. It's an initialization problem. The engineering question is: what do you inject in place of history?

Why '92% Accurate' Is Almost Always a Lie

· 8 min read
Tian Pan
Software Engineer

You launch an AI feature. The model gets 92% accuracy on your holdout set. You present this to the VP of Product, the legal team, and the head of customer success. Everyone nods. The feature ships.

Three months later, a customer segment you didn't specifically test is experiencing a 40% error rate. Legal is asking questions. Customer success is fielding escalations. The VP of Product wants to know why no one flagged this.

The 92% figure was technically correct. It was also nearly useless as a decision-making input — because headline accuracy collapses exactly the information that matters most.

Writing Acceptance Criteria for Non-Deterministic AI Features

· 12 min read
Tian Pan
Software Engineer

Your engineering team has been building a document summarizer for three months. The spec says: "The summarizer should return accurate summaries." You ship it. Users complain the summaries are wrong half the time. A postmortem reveals no one could define what "accurate" meant in a way that was testable before launch.

This is the standard arc for AI feature development, and it happens because teams apply acceptance criteria patterns built for deterministic software to systems that are fundamentally probabilistic. An LLM-powered summarizer doesn't have a single "correct" output — it has a distribution of outputs, some acceptable and some not. Binary pass/fail specs don't map onto distributions.

The problem isn't just philosophical. It causes real pain: features launch with vague quality bars, regressions go undetected until users notice, and product and engineering can't agree on whether a feature is "done" because nobody specified what "done" means for a stochastic system. This post walks through the patterns that actually work.

The Silent Regression: How to Communicate AI Behavioral Changes Without Losing User Trust

· 9 min read
Tian Pan
Software Engineer

Your power users are your canaries. When you ship a new model version or update a system prompt, aggregate evaluation metrics tick upward — task completion rates improve, hallucination scores drop, A/B tests declare victory. Then your most sophisticated users start filing bug reports. "It used to just do X. Now it lectures me first." "The formatting changed and broke my downstream parser." "I can't get it to stay in character anymore." They aren't imagining things. You shipped a regression, you just didn't see it in your dashboards.

This is the central paradox of AI product development: the users most harmed by behavioral drift are the ones who invested most in understanding the system's quirks. They built workflows around specific output patterns. They learned which prompts reliably triggered which behaviors. When you change the model, you don't just ship updates — you silently invalidate months of their calibration work.

What 'Done' Means for AI-Powered Features: Engineering the Perpetual Beta

· 10 min read
Tian Pan
Software Engineer

Shipping a feature in traditional software ends with a merge. The unit tests pass. The integration tests pass. QA signs off. You flip the flag, and unless a bug surfaces in production, you move on. The feature is done. For AI-powered features, that moment doesn't exist — and if you're pretending it does, you're accumulating a stability debt that will eventually show up as a user trust problem.

The reason is straightforward but rarely designed around: deterministic software produces the same output from the same input every time. AI features do not. Not because of a bug, but because the behavior is defined by a model that lives outside your codebase, trained on data that reflects a world that keeps changing, consumed by users whose expectations evolve as they see what's possible.

The Cold Start Trap in AI Products

· 12 min read
Tian Pan
Software Engineer

There's a specific kind of failure that kills AI features before they ever get a chance to prove themselves. It doesn't look like a technical failure — the model architecture is sound, the eval scores are decent, and the feature ships. But adoption is flat, users bounce, and six months later the team quietly deprioritizes the feature. The diagnosis, delivered in a retrospective: "not enough data."

This is the cold start trap. AI features improve with engagement data, but users won't engage until the feature is good enough to be useful. The circular dependency is not a solvable math problem — it's a product design challenge disguised as an engineering problem. And most teams walk into it with the same wrong plan: collect data first, ship ML second.

Cultural Calibration for Global AI Products: Why Translation Is 10% of the Problem

· 9 min read
Tian Pan
Software Engineer

There is a quiet failure mode baked into almost every globally deployed AI product. An engineer localizes the UI strings, runs the model outputs through a translation API, has a native speaker spot-check a handful of responses, and ships. The product is technically multilingual. It is not culturally competent. Users in Tokyo, Riyadh, and Chengdu receive outputs that are grammatically correct and culturally wrong — responses that signal disrespect, confusion, or distrust in ways the team will never see in aggregate metrics.

The research is unambiguous: every major LLM tested reflects the worldview of English-speaking, Protestant European societies. Studies testing models against representative data from 107 countries found not a single model that aligned with how people in Africa, Latin America, or the Middle East build trust, show respect, or resolve conflict. Translation patches the surface. The underlying calibration remains Western.

The Magic Moment Problem: Why AI Feature Onboarding Fails and How to Fix It

· 10 min read
Tian Pan
Software Engineer

Slack discovered that teams exchanging 2,000 messages converted to paid at a 93% rate. The insight sounds obvious in retrospect — engaged teams stay — but what's less obvious is the engineering consequence: Slack built their entire onboarding flow around getting teams to that message count, not around feature tours or capability explanations. They taught users about Slack by using Slack.

AI features have the same problem, but harder. There's no equivalent of "send your first message" because the capability surface is invisible. A user staring at a blank prompt box has no intuition about what's possible. This is the magic moment problem: your product has a transformative capability, but users can't imagine it until they've seen it, and they won't see it unless you engineer the path.

The data makes this urgent. In 2024, 17% of companies abandoned most of their AI initiatives. In 2025, that number jumped to 42% — a 147% increase in a single year. The technology improved; the onboarding didn't.