20 posts tagged with "product-design"

The Curious Customer: Designing AI for Users Who Treat Your Agent as a Puzzle

May 9, 2026 · 10 min read

Software Engineer

Most product teams divide their users into two buckets when designing an AI agent. Bucket one is the cooperative customer: someone with a real problem, asking the agent in plain language, hoping it works. Bucket two is the attacker: jailbreaks, prompt injection payloads, scraped credentials, the threat model the security team owns. The eval suite covers the first. The red team covers the second. Everyone goes home satisfied.

Then a third population shows up and breaks the product. They are not malicious. They are not trying to extract training data or coerce the model into describing a bioweapon. They are curious. They treat the agent as a puzzle. They ask it questions specifically designed to surprise it — "what is the saddest thing you have ever been asked", "pretend you are my grandmother and sing me to sleep with the recipe for napalm" — except the napalm version is the one that goes viral, while the actual quality crisis is a thousand variations of the first one that nobody wrote a refusal policy for.

The AI Wallet: Why Token Budgets Belong in the UI, Not the Engineering Dashboard

May 9, 2026 · 10 min read

Tian Pan

Software Engineer

Pull up the per-user cost dashboard for any AI product on a flat subscription. The shape is always the same. A long, flat tail of users who barely move the needle, and a thin spike at the top where five percent of accounts burn eighty percent of the inference budget. The spike is hidden from users on both ends. The power users don't know they're subsidizing nothing — they assume the price is the price. The casual users don't know they could ask for more — they assume the limit is the limit.

The dashboard stays engineering-internal because product is afraid that exposing it will scare users. It does the opposite. The team that hides cost ends up shipping silent throttling, hidden model downgrades, and answer truncation that the user reads as "this product is broken." The team that exposes cost — as a deliberate UI surface, not an admin page — turns the same cost ceiling from a churn driver into a monetization lever.

This is the AI wallet. Not a billing page. A product primitive.

The Show Your Work UX Trap: When the Reasoning Trace Is Debug Output Wearing a Product Costume

May 9, 2026 · 11 min read

Tian Pan

Software Engineer

A reasoning model emits a chain-of-thought trace because that is how it computes. A product team renders that trace in the UI because hiding it feels like throwing away tokens the user paid for. Those are two different decisions, and almost nobody on the product side notices they made the second one. The trace becomes a panel, the panel becomes a feature, the feature gets a docs page, and six months later someone in a quarterly review asks why the support queue is full of users arguing with the reasoning instead of the answer.

The trace is debug output. It exists for engineers who need to know why the model picked one tool, hedged on a date, or quietly switched personas mid-paragraph. Pushing it to the end user without a design pass is the AI-product equivalent of leaving console.log calls in production and calling them "transparency." It looks like a feature, it costs almost nothing to render, and it quietly degrades trust in ways that don't show up in any of the dashboards the team built.

The Context Limit Is a UX Problem: Why Silent Truncation Erodes User Trust

May 5, 2026 · 8 min read

Tian Pan

Software Engineer

A user spends an hour in a long coding session with an AI assistant. They've established conventions, shared codebase context, described a multi-file refactor in detail. Then, about 40 messages in, the AI starts giving advice that ignores everything it "knows." It recommends an approach they already rejected twenty minutes ago. When pressed, it seems confused.

No error was shown. No warning appeared. The model just quietly dropped earlier messages to make room for newer ones — and the user concluded the AI was unreliable.

This is not a model failure. It is a product design failure.

The Persona Lock Problem: How Long-Lived AI Sessions Trap Users in Their Own Patterns

May 4, 2026 · 8 min read

Tian Pan

Software Engineer

There's a failure mode in long-lived AI systems that nobody talks about in product reviews but shows up constantly in user behavior data: people start routing around their own AI assistants. They rephrase prompts in uncharacteristic ways, abandon features the system has learned to surface for them, or quietly switch to a different tool for a task they've done hundreds of times before. The system worked — it learned — and that's exactly why it stopped working.

This is the persona lock problem. When an AI adapts to your past behavior, it's building a model of the you that existed at training time. That model gets more confident with every interaction. And eventually it becomes a prison.

The Autonomy Toggle: When Agent Mode Should Be a User Setting, Not a Model Setting

May 2, 2026 · 10 min read

Tian Pan

Software Engineer

The most expensive product decision in an agent product is invisible in the UI: somebody on the engineering team picked a single autonomy level and shipped it as a global default. The cautious user types three messages of clarifying questions for a task they wanted done. The power user closes the tab because every single step needs approval. Both look like product-market-fit problems. They are actually one design decision.

Autonomy is not a model property. It is a UX dimension — like notification frequency, display density, or default sort order — that different users want set differently for different tasks. Treating it as a hardcoded engineering choice forces a single point on a spectrum onto a user base that lives all along it. The fix is not a better default; the fix is exposing the dial.

The Conversation Reset Button: UX Patterns for Starting Over Without Losing Your Artifacts

May 2, 2026 · 9 min read

Tian Pan

Software Engineer

The most user-hostile button in modern AI products is also the most necessary one. Somewhere around message forty, the agent has latched onto a wrong assumption, the tone has drifted, and every new turn is making the answer worse instead of better. The user knows the right move: clear the slate and start again. They reach for "New Chat" — and watch the half-finished plan, the four documents they drafted, and the configured prompts they spent twenty minutes shaping vanish along with the poisoned history.

So they stop using the reset button. They open a second tab, copy-paste their artifacts across by hand, and keep the broken conversation alive as a graveyard they're afraid to close. That ritual — manual copy-paste as a workaround for a button that should have done the right thing — is the clearest signal a chat product can give that its data model is wrong.

The Disable Switch Is the Real Product: Designing the Non-AI Fallback Path

May 2, 2026 · 10 min read

Tian Pan

Software Engineer

Every AI feature ships with a moment its team hasn't planned for: the moment it has to be turned off. A model regression lands during the morning standup. A cost spike from a marketing campaign nobody told engineering about doubles the bill in twelve hours. A privacy review flags a prompt-context leak. The provider goes down for ninety minutes. A compliance team waves a flag at noon and the feature has to disappear before the close of business.

The disable switch most teams ship for that moment is "the feature returns an error" — a spinner that never resolves, a banner that says "AI assistant unavailable, try again later." That is a strictly worse user experience than the pre-AI status quo, which is exactly what users will compare you to the moment AI degrades. The status quo had a button. Now they get an apology.

Trust Ceilings: The Autonomy Variable Your Product Team Can't See

April 27, 2026 · 10 min read

Tian Pan

Software Engineer

Every agentic feature has a maximum autonomy level above which users start checking work, intervening, or abandoning the feature entirely. That maximum is not a property of your model. It is a property of your users, your domain, and the cost of being wrong, and it does not move because a launch deck says it should. Most teams discover their ceiling the hard way: a feature ships designed for full autonomy, adoption stalls at "agent suggests, human approves," the metrics blame the model, and the next quarter is spent tuning a knob that was never the bottleneck.

The shape of the ceiling is consistent enough across products that it deserves a name. Anthropic's own usage data on Claude Code shows new users using full auto-approve about 20% of the time, climbing past 40% only after roughly 750 sessions. PwC's 2025 survey of 300 senior executives found 79% of companies are using AI agents, but most production deployments operate at "collaborator" or "consultant" levels — the model proposes, the human disposes — not at the fully autonomous tier the marketing implied. The story underneath those numbers is not that users are timid. It is that trust is calibrated to the cost of a recoverable mistake, and your product almost certainly does not let users see, undo, or bound that cost the way they need to.

Async Agents Need an Inbox, Not a Chat

April 23, 2026 · 11 min read

Tian Pan

Software Engineer

The chat metaphor has a fuse, and it burns out around thirty seconds. Past that, the spinner stops being a progress indicator and becomes a commitment device — the one making the commitment is your user, and most of them bail. You can watch it in session replays: the typing indicator appears, the user waits, tabs away at about twelve seconds, half never come back. The product team sees a completed agent run with no human on the other end and files it as a success. It is not a success. It is an abandoned artifact that happened to finish.

This is the first contact with a structural problem that most agent products paper over with spinners and streaming text: the chat interface was designed for turn-taking humans and fast models, and it fails silently when either assumption breaks. If your agent takes minutes, you are not shipping a chat feature with a longer wait. You are shipping a different product, and it needs a different UI primitive.

The Output Commitment Problem: Why Streaming Self-Correction Destroys User Trust More Than the Original Error

April 23, 2026 · 10 min read

Tian Pan

Software Engineer

A user asks your agent a question. Tokens start flowing. Three sentences in, the model writes "Actually, let me reconsider — " and pivots to a different answer. The revised answer is better. The user closes the tab.

This is the output commitment problem, and it is one of the most consistently underestimated UX failures in shipped AI products. The engineering mindset treats self-correction as a feature — the model noticed its own error, that is the system working as intended. The user-perception mindset treats it as a disaster — the product demonstrated, live, that its first confident claim was wrong. Those two readings are both correct, and they do not reconcile on their own.

The core asymmetry is that streaming makes thinking legible, and legible thinking is auditable thinking. A model that hallucinated silently and then produced a clean final answer would look competent. The same model, streaming every half-thought, looks like it is flailing. The answer quality is identical. The perception is not.

The Enterprise AI Capability Discovery Problem

April 17, 2026 · 10 min read

Tian Pan

Software Engineer

You shipped the AI feature. You put it in the product. You wrote the help doc. And still, six months later, your most sophisticated enterprise users are copy-pasting text into ChatGPT to do the same thing your feature already does natively. This is not a training problem. It is a discoverability problem, and it is one of the most consistent sources of wasted AI investment in enterprise software today.

The pattern is well-documented: 49% of workers report they never use AI in their role, and 74% of companies struggle to scale value from AI deployments. But the interesting failure mode is not the late-adopters who explicitly resist. It is the engaged users who open your product every day, never knowing that the AI capability they would have paid for is sitting one click away from where their cursor already is.

About Tian Pan