Skip to main content

26 posts tagged with "ux"

View all tags

The Agent Permission Prompt Has a Habituation Curve, and Your Safety Story Lives on Its Slope

· 10 min read
Tian Pan
Software Engineer

There is a number that should be on every agent product's safety dashboard, and almost nobody tracks it: the per-user approval rate over time. Ship a permission prompt for "may I send this email" or "may I run this query against production," and the curve goes the same way every time. Day one, users hesitate, read, sometimes click no. By week two, the prompt is the fifth one this hour, the cost of saying no is doing the work yourself, and the click-through rate converges to something north of 95%. The team's safety story still claims that the user approved every action. The user, in any meaningful cognitive sense, did not.

This is not a UX problem that better copy can fix. It is the same habituation phenomenon that flattened cookie banners, browser SSL warnings, and Windows UAC dialogs, applied to a substrate that operates orders of magnitude faster than any of those. A consent gate is a security control with a half-life. Ship it without measuring how fast it decays, and you ship a checkbox the user is trained to ignore by week two — and a compliance narrative that depends on a click that no longer means anything.

The Agent Finished Into an Empty Room: Stale-Context Delivery for Async Background Tasks

· 10 min read
Tian Pan
Software Engineer

A background agent that takes ninety seconds to finish a task is operating on a snapshot of the world from ninety seconds ago. By the time it returns, the user may have navigated to a different view, started a new conversation, archived the original request, or closed the tab entirely. Most agent frameworks ship the result anyway, mutate state to reflect it, and treat the round trip as a success. It is not a success. It is the agent finishing into an empty room.

The failure mode is uglier than dropping the result. A dropped result is a missed delivery — annoying but recoverable. An applied stale result is an answer to a question the user is no longer asking, written against state that no longer matches, often overwriting the work the user moved on to. The user notices that something they did not ask for has happened, cannot reconstruct why, and loses trust in the system in a way that a simple timeout never would.

The fix is not faster agents. It is a delivery-time relevance gate that treats the moment of return as a fresh decision, not the foregone conclusion of the moment of dispatch.

The Inverted Agent: When the User Is the Planner and the Model Is the Step-Executor

· 12 min read
Tian Pan
Software Engineer

Most agent products today implement a simple bargain: the model decides what to do, the user clicks "approve." This is the right shape for low-stakes consumer chat — booking a restaurant, summarizing an inbox, drafting a casual reply. It is catastrophically wrong for legal drafting, financial advisory, medical triage, and incident response, where the user holds the accountability the model never can, and where the cost of the wrong plan dwarfs the cost of any individual step.

The inverted agent flips the polarity. The user composes the plan as a sequence of named, reorderable steps. The model executes each step on demand — with full context, with tool access, with reasoning — but never decides what step comes next. The model can suggest, but suggestions are advisory, not autonomous. This is not a worse autonomous agent; it is a different product, with a strictly worse cost-and-latency profile and a strictly better trust profile, aimed at users who would otherwise decline to adopt the autonomous version at all.

The mistake teams keep making is treating "autonomy" as a default to push toward. It is a UX axis you choose per-surface. Get the polarity wrong and you ship a feature your highest-stakes users will quietly refuse to touch.

Speculative Decoding Is a Streaming Protocol Decision, Not an Inference Optimization

· 12 min read
Tian Pan
Software Engineer

The "identical output" guarantee that ships with every speculative decoding paper is a guarantee about token distributions, not about what your user sees. Read the proofs carefully and you find a clean mathematical equivalence: the rejection-sampling acceptance criterion is designed so that the output distribution after speculation is exactly the distribution the target model would have produced on its own. That guarantee binds the bytes that leave the inference engine. It says nothing about the bytes that arrived on the user's screen five hundred milliseconds ago and have to be taken back.

If you stream draft tokens to the client the moment the small model emits them, you are running an A/B test on your own users every time the verifier rejects a suffix. Half a paragraph rewrites itself. A function name changes after the IDE has already syntax-highlighted it. A TTS voice has already pronounced "the answer is likely no" before the verifier swaps in "the answer is yes, with caveats." The math says the final distribution is the same as the slow path. The user's experience says they watched the model change its mind in public.

This is the part of speculative decoding that doesn't make it into the speedup numbers. It is also the part that turns "free 3× throughput" into a half-quarter of streaming-protocol work that nobody scoped.

Interview Mode vs. Task Mode: The Unspoken Contract Your Agent Keeps Breaking

· 11 min read
Tian Pan
Software Engineer

Open any agent's customer feedback channel and you will find two complaints, both loud, both common, and both blamed on the model. The first sounds like "it asks too many questions before doing anything." The second sounds like "it just runs off and does the wrong thing without checking with me." Product teams hear those as opposite problems and ship opposite fixes — tighten the system prompt to ask fewer questions, then loosen it again next quarter when the other complaint gets louder. Neither change works for long, because the two complaints are not really about questions or actions. They are about a contract the user picked silently and the agent failed to honor.

Every conversation with an agent operates in one of two implicit modes. Interview mode is the contract where the user expects the agent to extract requirements before doing anything substantive — clarifying questions are welcome, premature execution is the failure. Task mode is the contract where the user has already done the thinking, has a specific plan in mind, and expects the agent to execute on the available context, asking only when truly blocked — questions are friction, half-baked execution is the failure.

Users do not announce which mode they are in. They expect the agent to read it from the message, the conversation history, and the situation, and they punish the agent harshly when it gets it wrong. The fix for "asks too many questions" and "didn't ask enough questions" is the same fix: make mode a first-class concept in your agent, detect it from signals you can actually see, and surface it to the user when you are unsure.

Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run

· 10 min read
Tian Pan
Software Engineer

Watch a developer use an agentic IDE for twenty minutes and you will see the same micro-drama play out three times. The agent starts a long task. Two tool calls in, the user realizes they want a functional component instead of a class, or a v2 endpoint instead of v1, or tests written in Vitest instead of Jest. They have exactly one lever: the red stop button. They press it. The agent dies mid-edit. They copy-paste the last prompt, append the correction, and pay for the first eight minutes of work twice.

The abort button is the wrong affordance. It treats "I want to adjust the plan" and "I want to throw away the run" as the same gesture. In practice they are as different as a steering wheel and an ejector seat, and conflating them is why so many agent products feel brittle the moment a task takes longer than a single screen of output.

The AI Audit Trail Is a Product Feature, Not a Compliance Checkbox

· 8 min read
Tian Pan
Software Engineer

McKinsey's 2025 survey found that 75% of business leaders were using generative AI in some form — but nearly half had already experienced a significant negative consequence. That gap is not a model quality problem. It's a trust problem. And the fastest path to closing it is not more evals, better prompts, or a new frontier model. It's showing users exactly what the agent did.

Most engineering teams treat the audit trail as an afterthought — something you wire up for GDPR compliance or SOC 2, then lock in an internal dashboard that only ops reads. That's the wrong frame. When users can see which tool the agent called, what data it retrieved, and which reasoning branch produced the answer, three things happen: adoption goes up, support escalations go down, and model errors surface days earlier than they would from any backend alert.

The AI Feature Nobody Uses: How Teams Ship Capabilities That Never Get Adopted

· 9 min read
Tian Pan
Software Engineer

A VP of Product at a mid-market project management company spent three quarters of her engineering team's roadmap building an AI assistant. Six months after launch, weekly active usage sat at 4%. When asked why they built it: "Our competitor announced one. Our board asked when we'd have ours." That's a panic decision dressed up as a product strategy — and it's endemic right now.

The 4% isn't an outlier. A customer success platform shipped AI-generated call summaries to 6% adoption after four months. A logistics SaaS added AI route optimization suggestions and got 11% click-through with a 2% action rate. An HR platform launched an AI policy Q&A bot that spiked for two weeks and flatlined at 3%. The pattern is consistent enough to name: ship an AI feature, watch it get ignored, quietly sunset it eighteen months later.

The default explanation is that the AI wasn't good enough. Sometimes that's true. More often, the model was fine — users just never found the feature at all.

Graceful Tool-Call Failure: The Error Contract Your Agent UI Is Missing

· 11 min read
Tian Pan
Software Engineer

Every agent demo you've ever seen ended with a clean result. The tool call returned exactly the data the model expected, the response arrived in well under two seconds, and the final answer was crisp and correct. That's the demo. Production is something else.

In production, tools time out. APIs return 403s because a service account was rotated last Tuesday. Third-party enrichment endpoints return a 200 with a body that says {"status": "degraded", "data": null}. OAuth tokens expire at 3 AM on a Saturday. These aren't edge cases — they're the normal operating conditions of any agent that talks to the real world. The failure modes are predictable. The problem is that most agent architectures treat them as afterthoughts, and most agent UIs have no vocabulary for communicating them to users at all.

The Latency Perception Gap: Why a 3-Second Stream Feels Faster Than a 1-Second Batch

· 11 min read
Tian Pan
Software Engineer

Your users don't have a stopwatch. They have feelings. And those feelings diverge from wall-clock reality in ways that matter enormously for how you build AI interfaces. A response that appears character-by-character over three seconds will consistently feel faster to users than a response that materializes all at once after one second — even though the batch system is objectively faster. This isn't irrational or a bug in human cognition. It's a well-documented perceptual phenomenon, and if you're building AI products without accounting for it, you're optimizing for the wrong metric.

This post breaks down the psychology behind latency perception, the metrics that actually predict user satisfaction, the frontend patterns that exploit these perceptual quirks, and when streaming adds more complexity than it's worth.

Why Users Ignore the AI Feature You Spent Three Months Building

· 10 min read
Tian Pan
Software Engineer

Your team spent three months integrating an LLM into your product. The model works. The latency is acceptable. The demo looks great. You ship. And then you watch the usage metrics flatline at 4%.

This is the typical arc. Most AI features fail not at the model level but at the adoption level. The underlying cause isn't technical — it's a cluster of product decisions that were made (or not made) around discoverability, trust, and habit formation. Understanding why adoption fails, and what to actually measure and change, separates teams that ship useful AI from teams that ship impressive demos.

The Cognitive Load Inversion: Why AI Suggestions Feel Helpful but Exhaust You

· 9 min read
Tian Pan
Software Engineer

There's a number in the AI productivity research that almost nobody talks about: 39 percentage points. In a study of experienced developers, participants predicted AI tools would make them 24% faster. After completing the tasks, they still believed they'd been 20% faster. The measured reality: they were 19% slower. The perception gap is 39 points—and it compounds with every sprint, every code review, every feature shipped.

This is the cognitive load inversion. AI tools are excellent at offloading the cheap cognitive work—writing syntactically correct code, drafting boilerplate, suggesting function names—while generating a harder class of cognitive work: continuous evaluation of uncertain outputs. You didn't eliminate cognitive effort. You automated the easy half and handed yourself the hard half.