Skip to main content

42 posts tagged with "ux"

View all tags

The AI Accessibility Audit Nobody Runs

· 11 min read
Tian Pan
Software Engineer

Open your agent product, turn on VoiceOver, and hit send on any prompt. If you have a typical streaming UI with an inline reasoning trace, what you will hear in the next thirty seconds is not your product. It is a torrent of partial tokens, mid-word reflows, status changes nobody announced, and a reasoning monologue your sighted users opted into but your blind users cannot escape. The interface that demoed beautifully on stage is, to a screen reader, a denial-of-service attack delivered as speech.

This is the audit nobody on the AI team runs. The design review approved the streaming animation. The eval suite measured answer quality. The latency dashboard tracked time-to-first-token. None of those instruments noticed that the affordance making the product feel fast and thoughtful for one cohort makes it unusable for another. And that omission is starting to show up in pro-se lawsuit filings — the same federal courts that have been processing accessibility complaints against e-commerce sites for a decade are now seeing AI-interface complaints rise sharply, with one tracker reporting a 40% year-over-year increase in 2025 alone.

Background Agents and the Notification Budget: Why Proactive AI Hits a Hard Ceiling at User Attention

· 10 min read
Tian Pan
Software Engineer

The first generation of AI assistants waited politely. You typed, they answered. The second generation does not wait. It watches your calendar, scans your inbox, reads your repo activity, and surfaces "you should know about this" interruptions before you have asked for anything. The pitch is compelling and the demos are mesmerizing. The retention curves, once these features ship, are not.

There is a number nobody puts on the launch slide: the user has a daily ceiling on unsolicited AI updates, and it is roughly three to five across all sources combined. The proactive agent that ships its tenth notification of the week is the same agent the user mutes by Friday and uninstalls the following month. This is not a UX polish problem. It is the architectural blind spot of the entire proactive-AI category, and it deserves a name: the notification budget.

User Trust Half-Life: Why One Bad Session Erases Weeks of Calibration

· 10 min read
Tian Pan
Software Engineer

A user's calibration of an AI feature is one of the most expensive things you ship. It costs them weeks of attention: learning which prompts work, where the model's reliable, when to double-check, what to ignore entirely. Then a single visible failure — a wrong number in a generated report, a hallucinated citation the user pasted into a deck, a confidently-incorrect recommendation they acted on — can vaporize all of it in one session. The recovery curve isn't symmetric. The user's prior was "this is reliable," and the update doesn't land as a data point. It lands as a betrayal.

The team measuring DAU sees nothing for weeks. The user keeps opening the app out of habit, runs a few queries, doesn't act on the output, and then quietly stops. By the time engagement metrics flinch, the trust event that caused it is two months old and nobody on the team remembers shipping it.

The AI Wallet: Why Token Budgets Belong in the UI, Not the Engineering Dashboard

· 10 min read
Tian Pan
Software Engineer

Pull up the per-user cost dashboard for any AI product on a flat subscription. The shape is always the same. A long, flat tail of users who barely move the needle, and a thin spike at the top where five percent of accounts burn eighty percent of the inference budget. The spike is hidden from users on both ends. The power users don't know they're subsidizing nothing — they assume the price is the price. The casual users don't know they could ask for more — they assume the limit is the limit.

The dashboard stays engineering-internal because product is afraid that exposing it will scare users. It does the opposite. The team that hides cost ends up shipping silent throttling, hidden model downgrades, and answer truncation that the user reads as "this product is broken." The team that exposes cost — as a deliberate UI surface, not an admin page — turns the same cost ceiling from a churn driver into a monetization lever.

This is the AI wallet. Not a billing page. A product primitive.

Conversational REST: When Your Chat UI Needs Pagination, Filters, and Sort

· 11 min read
Tian Pan
Software Engineer

A user asks your shopping agent for "running shoes under $150 with good arch support." The model dutifully returns twelve options as a wall of bulleted text inside a single chat bubble that overflows the viewport. The user scrolls, loses their place, and types "show me only Asics" — at which point your agent re-runs the entire search instead of filtering the result set it already has. Three turns later, the user is inventing a query language one prompt at a time, and your product feels like a command line wearing a chat-bubble costume.

This is the failure mode I keep watching teams ship. They built a chat product on top of what users actually wanted to be a faceted-search product. The model is fine. The retrieval is fine. The UI is the problem, and it's the wrong shape for the task.

The shortest way I can put it: chat is an input modality, not an output one. The agent's job is to translate user intent into a structured query. The moment the result set is more than three items, the right answer is to render UI, not to keep talking.

The Show Your Work UX Trap: When the Reasoning Trace Is Debug Output Wearing a Product Costume

· 11 min read
Tian Pan
Software Engineer

A reasoning model emits a chain-of-thought trace because that is how it computes. A product team renders that trace in the UI because hiding it feels like throwing away tokens the user paid for. Those are two different decisions, and almost nobody on the product side notices they made the second one. The trace becomes a panel, the panel becomes a feature, the feature gets a docs page, and six months later someone in a quarterly review asks why the support queue is full of users arguing with the reasoning instead of the answer.

The trace is debug output. It exists for engineers who need to know why the model picked one tool, hedged on a date, or quietly switched personas mid-paragraph. Pushing it to the end user without a design pass is the AI-product equivalent of leaving console.log calls in production and calling them "transparency." It looks like a feature, it costs almost nothing to render, and it quietly degrades trust in ways that don't show up in any of the dashboards the team built.

AI Clarification Dialogues That Actually Converge: Designing for One-Turn Resolution

· 11 min read
Tian Pan
Software Engineer

AI systems that ask before acting are demonstrably more reliable. They avoid irreversible mistakes, surface misunderstandings before they propagate, and generate higher-quality outputs on the first real attempt.

The problem is that most implementations of this principle are a UX disaster. Instead of asking one good question, they ask three mediocre ones. Users who needed to clarify a ten-word instruction end up in a five-turn interrogation that takes longer than just doing the task wrong and fixing it afterward. The reliability win evaporates, replaced by abandonment.

This is a design problem, not a model capability problem. The models are capable of asking precise, high-value questions. What's missing is an architectural constraint that forces convergence: a rule that treats multi-turn clarification as a failure mode to engineer around, not a feature to rely on.

AI Co-Pilot vs. AI Pilot: The Evidence-Based Product Decision Framework

· 9 min read
Tian Pan
Software Engineer

Every product team building with AI faces the same fork in the road: should the AI advise humans, or should it act on its own? The framing sounds philosophical, but the answer is actually measurable — and getting it wrong is expensive in ways that don't show up until six months after launch, when your override metrics look fine and your user trust scores are quietly collapsing.

Klarna replaced 700 customer service agents with an autonomous AI system in early 2024. By 2025, the CEO admitted they had "gone too far" and began quietly rehiring humans for complex cases. The AI handled 2.3 million conversations in a month and resolved issues in under 2 minutes instead of 11. The numbers looked great. The underlying problem — that customer service for financial products requires empathy and judgment, not just resolution speed — showed up later, in declining satisfaction on anything outside the happy path.

Explanation Debt: Why Users Deserve to Know What Your AI Did

· 8 min read
Tian Pan
Software Engineer

A loan application gets rejected. A candidate gets filtered out of a hiring pipeline. A medical imaging tool flags a scan as abnormal. In each case, an AI system made a decision that matters—and the user has no idea why.

Teams building these systems often spent months tuning precision, recall, and output quality. They ran A/B tests, iterated on prompts, and shipped a model that gets the right answer 94% of the time. But they never built the layer that tells users what happened. This is explanation debt: the accumulated cost of shipping AI decisions without the attribution, confidence signals, and recourse affordances that make those decisions interpretable.

Building Trust Recovery Flows: What Happens After Your AI Makes a Visible Mistake

· 9 min read
Tian Pan
Software Engineer

When Google's AI Overview told users to add glue to pizza sauce and eat rocks for digestive health, it didn't just embarrass a product team — it exposed a systemic gap in how we think about AI reliability. The failure wasn't just that the model was wrong. The failure was that the model was confidently wrong, in a high-visibility context, with no recovery path for the users it misled.

Trust in AI systems doesn't erode gradually. Research shows it follows a cliff-like collapse pattern: a single noticeable error can produce a disproportionate trust decline with measurable effect sizes. Only 29% of developers say they trust AI tools — an 11-point drop from the previous year, even as adoption climbs to 84%. We're building systems that people use but don't trust. That gap matters when your product ships agentic features that act on behalf of users.

This post is about what engineers and product builders should do after the mistake happens — not just how to prevent it.

The Context Limit Is a UX Problem: Why Silent Truncation Erodes User Trust

· 8 min read
Tian Pan
Software Engineer

A user spends an hour in a long coding session with an AI assistant. They've established conventions, shared codebase context, described a multi-file refactor in detail. Then, about 40 messages in, the AI starts giving advice that ignores everything it "knows." It recommends an approach they already rejected twenty minutes ago. When pressed, it seems confused.

No error was shown. No warning appeared. The model just quietly dropped earlier messages to make room for newer ones — and the user concluded the AI was unreliable.

This is not a model failure. It is a product design failure.

Non-Blocking AI: Async UX Patterns That Keep Applications Responsive While Agents Work

· 11 min read
Tian Pan
Software Engineer

Most teams discover the synchronous UI problem the same way: a user clicks "Generate report" and the browser tab goes silent for forty seconds. No spinner, no progress, just a frozen button. Half the users hit refresh and submit twice. The other half assume the product is broken and close the tab.

The root issue is not agent latency — it's that LLM-backed agents operate on timescales that break every assumption baked into synchronous request-response UX. A single GPT-4o call averages 8–15 seconds. A multi-step agent that searches the web, reads three documents, writes a draft, then formats the output can take two to four minutes. You cannot make that feel fast by optimizing the agent. You have to redesign the contract between your backend and your UI.