56 posts tagged with "ux"

The Context Limit Is a UX Problem: Why Silent Truncation Erodes User Trust

May 5, 2026 · 8 min read

Software Engineer

A user spends an hour in a long coding session with an AI assistant. They've established conventions, shared codebase context, described a multi-file refactor in detail. Then, about 40 messages in, the AI starts giving advice that ignores everything it "knows." It recommends an approach they already rejected twenty minutes ago. When pressed, it seems confused.

No error was shown. No warning appeared. The model just quietly dropped earlier messages to make room for newer ones — and the user concluded the AI was unreliable.

This is not a model failure. It is a product design failure.

Non-Blocking AI: Async UX Patterns That Keep Applications Responsive While Agents Work

May 5, 2026 · 11 min read

Tian Pan

Software Engineer

Most teams discover the synchronous UI problem the same way: a user clicks "Generate report" and the browser tab goes silent for forty seconds. No spinner, no progress, just a frozen button. Half the users hit refresh and submit twice. The other half assume the product is broken and close the tab.

The root issue is not agent latency — it's that LLM-backed agents operate on timescales that break every assumption baked into synchronous request-response UX. A single GPT-4o call averages 8–15 seconds. A multi-step agent that searches the web, reads three documents, writes a draft, then formats the output can take two to four minutes. You cannot make that feel fast by optimizing the agent. You have to redesign the contract between your backend and your UI.

The Persona Lock Problem: How Long-Lived AI Sessions Trap Users in Their Own Patterns

May 4, 2026 · 8 min read

Tian Pan

Software Engineer

There's a failure mode in long-lived AI systems that nobody talks about in product reviews but shows up constantly in user behavior data: people start routing around their own AI assistants. They rephrase prompts in uncharacteristic ways, abandon features the system has learned to surface for them, or quietly switch to a different tool for a task they've done hundreds of times before. The system worked — it learned — and that's exactly why it stopped working.

This is the persona lock problem. When an AI adapts to your past behavior, it's building a model of the you that existed at training time. That model gets more confident with every interaction. And eventually it becomes a prison.

The Autonomy Toggle: When Agent Mode Should Be a User Setting, Not a Model Setting

May 2, 2026 · 10 min read

Tian Pan

Software Engineer

The most expensive product decision in an agent product is invisible in the UI: somebody on the engineering team picked a single autonomy level and shipped it as a global default. The cautious user types three messages of clarifying questions for a task they wanted done. The power user closes the tab because every single step needs approval. Both look like product-market-fit problems. They are actually one design decision.

Autonomy is not a model property. It is a UX dimension — like notification frequency, display density, or default sort order — that different users want set differently for different tasks. Treating it as a hardcoded engineering choice forces a single point on a spectrum onto a user base that lives all along it. The fix is not a better default; the fix is exposing the dial.

Per-User AI Quotas: The UX Layer Your Cost Dashboard Can't See

May 2, 2026 · 10 min read

Tian Pan

Software Engineer

A user opens your AI feature at 3pm on a Tuesday. They've been using it lightly for three weeks. This time the request hangs for eight seconds and returns a red banner: "Something went wrong. Please try again later." They try again. Same banner. They close the tab and go back to whatever they were doing before — and they tell their teammate at standup the next morning that "the AI thing is broken."

What actually happened: they crossed an invisible per-user quota that your cost team set six months ago to keep a single power user from blowing through the GPU budget. The quota worked. Spend stayed flat. The dashboard is green. The feature is, by every metric your engineering org tracks, healthy. It's also dead, because the user who got that banner is never coming back, and the three teammates they told at standup will never try it.

This is the gap your cost dashboard cannot see. Per-user AI quotas are a product surface. The team that hides them inside an HTTP 429 is letting their cost-control system silently shape user perception of the product, and they will not find out until churn shows up in a quarterly review with no obvious cause.

Clarification Budgets: When Your Agent Should Ask Instead of Guess

May 1, 2026 · 10 min read

Tian Pan

Software Engineer

The two worst agent failure modes feel like opposites, but they originate from the same broken policy. The first agent asks four follow-up questions before doing anything and trains its users to abandon it. The second agent never asks, confidently produces output the user has to redo, and trains its users to mistrust it. Same policy, different settings of one missing parameter: the cost of a question relative to the cost of a wrong answer.

Most agents do not have a policy at all. The model is asked to "be helpful" and is left to negotiate ambiguity on its own. Because next-token prediction rewards committing to an answer, the agent leans toward guessing. Because RLHF rewards politeness, the agent occasionally over-corrects and asks a question for safety. The result is unprincipled behavior that varies from session to session, with no team-level intuition about when the agent will pause and when it will charge ahead.

A clarification budget is the missing parameter. It is a per-task allowance for how much friction the agent is permitted to impose, paired with a decision rule for when a question is worth spending that budget on. Think of it as the conversational analog of a latency budget — every product has one, even if no one wrote it down, and the team that writes it down stops shipping confused agents.

The Agent Permission Prompt Has a Habituation Curve, and Your Safety Story Lives on Its Slope

April 28, 2026 · 10 min read

Tian Pan

Software Engineer

There is a number that should be on every agent product's safety dashboard, and almost nobody tracks it: the per-user approval rate over time. Ship a permission prompt for "may I send this email" or "may I run this query against production," and the curve goes the same way every time. Day one, users hesitate, read, sometimes click no. By week two, the prompt is the fifth one this hour, the cost of saying no is doing the work yourself, and the click-through rate converges to something north of 95%. The team's safety story still claims that the user approved every action. The user, in any meaningful cognitive sense, did not.

This is not a UX problem that better copy can fix. It is the same habituation phenomenon that flattened cookie banners, browser SSL warnings, and Windows UAC dialogs, applied to a substrate that operates orders of magnitude faster than any of those. A consent gate is a security control with a half-life. Ship it without measuring how fast it decays, and you ship a checkbox the user is trained to ignore by week two — and a compliance narrative that depends on a click that no longer means anything.

The Agent Finished Into an Empty Room: Stale-Context Delivery for Async Background Tasks

April 27, 2026 · 10 min read

Tian Pan

Software Engineer

A background agent that takes ninety seconds to finish a task is operating on a snapshot of the world from ninety seconds ago. By the time it returns, the user may have navigated to a different view, started a new conversation, archived the original request, or closed the tab entirely. Most agent frameworks ship the result anyway, mutate state to reflect it, and treat the round trip as a success. It is not a success. It is the agent finishing into an empty room.

The failure mode is uglier than dropping the result. A dropped result is a missed delivery — annoying but recoverable. An applied stale result is an answer to a question the user is no longer asking, written against state that no longer matches, often overwriting the work the user moved on to. The user notices that something they did not ask for has happened, cannot reconstruct why, and loses trust in the system in a way that a simple timeout never would.

The fix is not faster agents. It is a delivery-time relevance gate that treats the moment of return as a fresh decision, not the foregone conclusion of the moment of dispatch.

The Inverted Agent: When the User Is the Planner and the Model Is the Step-Executor

April 27, 2026 · 12 min read

Tian Pan

Software Engineer

Most agent products today implement a simple bargain: the model decides what to do, the user clicks "approve." This is the right shape for low-stakes consumer chat — booking a restaurant, summarizing an inbox, drafting a casual reply. It is catastrophically wrong for legal drafting, financial advisory, medical triage, and incident response, where the user holds the accountability the model never can, and where the cost of the wrong plan dwarfs the cost of any individual step.

The inverted agent flips the polarity. The user composes the plan as a sequence of named, reorderable steps. The model executes each step on demand — with full context, with tool access, with reasoning — but never decides what step comes next. The model can suggest, but suggestions are advisory, not autonomous. This is not a worse autonomous agent; it is a different product, with a strictly worse cost-and-latency profile and a strictly better trust profile, aimed at users who would otherwise decline to adopt the autonomous version at all.

The mistake teams keep making is treating "autonomy" as a default to push toward. It is a UX axis you choose per-surface. Get the polarity wrong and you ship a feature your highest-stakes users will quietly refuse to touch.

Speculative Decoding Is a Streaming Protocol Decision, Not an Inference Optimization

April 27, 2026 · 12 min read

Tian Pan

Software Engineer

The "identical output" guarantee that ships with every speculative decoding paper is a guarantee about token distributions, not about what your user sees. Read the proofs carefully and you find a clean mathematical equivalence: the rejection-sampling acceptance criterion is designed so that the output distribution after speculation is exactly the distribution the target model would have produced on its own. That guarantee binds the bytes that leave the inference engine. It says nothing about the bytes that arrived on the user's screen five hundred milliseconds ago and have to be taken back.

If you stream draft tokens to the client the moment the small model emits them, you are running an A/B test on your own users every time the verifier rejects a suffix. Half a paragraph rewrites itself. A function name changes after the IDE has already syntax-highlighted it. A TTS voice has already pronounced "the answer is likely no" before the verifier swaps in "the answer is yes, with caveats." The math says the final distribution is the same as the slow path. The user's experience says they watched the model change its mind in public.

This is the part of speculative decoding that doesn't make it into the speedup numbers. It is also the part that turns "free 3× throughput" into a half-quarter of streaming-protocol work that nobody scoped.

Interview Mode vs. Task Mode: The Unspoken Contract Your Agent Keeps Breaking

April 23, 2026 · 11 min read

Tian Pan

Software Engineer

Open any agent's customer feedback channel and you will find two complaints, both loud, both common, and both blamed on the model. The first sounds like "it asks too many questions before doing anything." The second sounds like "it just runs off and does the wrong thing without checking with me." Product teams hear those as opposite problems and ship opposite fixes — tighten the system prompt to ask fewer questions, then loosen it again next quarter when the other complaint gets louder. Neither change works for long, because the two complaints are not really about questions or actions. They are about a contract the user picked silently and the agent failed to honor.

Every conversation with an agent operates in one of two implicit modes. Interview mode is the contract where the user expects the agent to extract requirements before doing anything substantive — clarifying questions are welcome, premature execution is the failure. Task mode is the contract where the user has already done the thinking, has a specific plan in mind, and expects the agent to execute on the available context, asking only when truly blocked — questions are friction, half-baked execution is the failure.

Users do not announce which mode they are in. They expect the agent to read it from the message, the conversation history, and the situation, and they punish the agent harshly when it gets it wrong. The fix for "asks too many questions" and "didn't ask enough questions" is the same fix: make mode a first-class concept in your agent, detect it from signals you can actually see, and surface it to the user when you are unsure.

Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run

April 23, 2026 · 10 min read

Tian Pan

Software Engineer

Watch a developer use an agentic IDE for twenty minutes and you will see the same micro-drama play out three times. The agent starts a long task. Two tool calls in, the user realizes they want a functional component instead of a class, or a v2 endpoint instead of v1, or tests written in Vitest instead of Jest. They have exactly one lever: the red stop button. They press it. The agent dies mid-edit. They copy-paste the last prompt, append the correction, and pay for the first eight minutes of work twice.

The abort button is the wrong affordance. It treats "I want to adjust the plan" and "I want to throw away the run" as the same gesture. In practice they are as different as a steering wheel and an ejector seat, and conflating them is why so many agent products feel brittle the moment a task takes longer than a single screen of output.

About Tian Pan