Skip to main content

33 posts tagged with "ux"

View all tags

Mid-Flight Steering: Redirecting a Long-Running Agent Without Killing the Run

· 10 min read
Tian Pan
Software Engineer

Watch a developer use an agentic IDE for twenty minutes and you will see the same micro-drama play out three times. The agent starts a long task. Two tool calls in, the user realizes they want a functional component instead of a class, or a v2 endpoint instead of v1, or tests written in Vitest instead of Jest. They have exactly one lever: the red stop button. They press it. The agent dies mid-edit. They copy-paste the last prompt, append the correction, and pay for the first eight minutes of work twice.

The abort button is the wrong affordance. It treats "I want to adjust the plan" and "I want to throw away the run" as the same gesture. In practice they are as different as a steering wheel and an ejector seat, and conflating them is why so many agent products feel brittle the moment a task takes longer than a single screen of output.

The AI Audit Trail Is a Product Feature, Not a Compliance Checkbox

· 8 min read
Tian Pan
Software Engineer

McKinsey's 2025 survey found that 75% of business leaders were using generative AI in some form — but nearly half had already experienced a significant negative consequence. That gap is not a model quality problem. It's a trust problem. And the fastest path to closing it is not more evals, better prompts, or a new frontier model. It's showing users exactly what the agent did.

Most engineering teams treat the audit trail as an afterthought — something you wire up for GDPR compliance or SOC 2, then lock in an internal dashboard that only ops reads. That's the wrong frame. When users can see which tool the agent called, what data it retrieved, and which reasoning branch produced the answer, three things happen: adoption goes up, support escalations go down, and model errors surface days earlier than they would from any backend alert.

The AI Feature Nobody Uses: How Teams Ship Capabilities That Never Get Adopted

· 9 min read
Tian Pan
Software Engineer

A VP of Product at a mid-market project management company spent three quarters of her engineering team's roadmap building an AI assistant. Six months after launch, weekly active usage sat at 4%. When asked why they built it: "Our competitor announced one. Our board asked when we'd have ours." That's a panic decision dressed up as a product strategy — and it's endemic right now.

The 4% isn't an outlier. A customer success platform shipped AI-generated call summaries to 6% adoption after four months. A logistics SaaS added AI route optimization suggestions and got 11% click-through with a 2% action rate. An HR platform launched an AI policy Q&A bot that spiked for two weeks and flatlined at 3%. The pattern is consistent enough to name: ship an AI feature, watch it get ignored, quietly sunset it eighteen months later.

The default explanation is that the AI wasn't good enough. Sometimes that's true. More often, the model was fine — users just never found the feature at all.

Graceful Tool-Call Failure: The Error Contract Your Agent UI Is Missing

· 11 min read
Tian Pan
Software Engineer

Every agent demo you've ever seen ended with a clean result. The tool call returned exactly the data the model expected, the response arrived in well under two seconds, and the final answer was crisp and correct. That's the demo. Production is something else.

In production, tools time out. APIs return 403s because a service account was rotated last Tuesday. Third-party enrichment endpoints return a 200 with a body that says {"status": "degraded", "data": null}. OAuth tokens expire at 3 AM on a Saturday. These aren't edge cases — they're the normal operating conditions of any agent that talks to the real world. The failure modes are predictable. The problem is that most agent architectures treat them as afterthoughts, and most agent UIs have no vocabulary for communicating them to users at all.

The Latency Perception Gap: Why a 3-Second Stream Feels Faster Than a 1-Second Batch

· 11 min read
Tian Pan
Software Engineer

Your users don't have a stopwatch. They have feelings. And those feelings diverge from wall-clock reality in ways that matter enormously for how you build AI interfaces. A response that appears character-by-character over three seconds will consistently feel faster to users than a response that materializes all at once after one second — even though the batch system is objectively faster. This isn't irrational or a bug in human cognition. It's a well-documented perceptual phenomenon, and if you're building AI products without accounting for it, you're optimizing for the wrong metric.

This post breaks down the psychology behind latency perception, the metrics that actually predict user satisfaction, the frontend patterns that exploit these perceptual quirks, and when streaming adds more complexity than it's worth.

Why Users Ignore the AI Feature You Spent Three Months Building

· 10 min read
Tian Pan
Software Engineer

Your team spent three months integrating an LLM into your product. The model works. The latency is acceptable. The demo looks great. You ship. And then you watch the usage metrics flatline at 4%.

This is the typical arc. Most AI features fail not at the model level but at the adoption level. The underlying cause isn't technical — it's a cluster of product decisions that were made (or not made) around discoverability, trust, and habit formation. Understanding why adoption fails, and what to actually measure and change, separates teams that ship useful AI from teams that ship impressive demos.

The Cognitive Load Inversion: Why AI Suggestions Feel Helpful but Exhaust You

· 9 min read
Tian Pan
Software Engineer

There's a number in the AI productivity research that almost nobody talks about: 39 percentage points. In a study of experienced developers, participants predicted AI tools would make them 24% faster. After completing the tasks, they still believed they'd been 20% faster. The measured reality: they were 19% slower. The perception gap is 39 points—and it compounds with every sprint, every code review, every feature shipped.

This is the cognitive load inversion. AI tools are excellent at offloading the cheap cognitive work—writing syntactically correct code, drafting boilerplate, suggesting function names—while generating a harder class of cognitive work: continuous evaluation of uncertain outputs. You didn't eliminate cognitive effort. You automated the easy half and handed yourself the hard half.

The Conversation Designer's Hidden Role in AI Product Quality

· 10 min read
Tian Pan
Software Engineer

Most engineering teams treat system prompts as configuration files — technical strings to be iterated on quickly, stored in environment variables, and deployed with the same ceremony as changing a timeout value. The system prompt gets an inline comment. The error messages get none. The capability disclosure is whatever the PM typed into the Notion doc on launch day.

This is the root cause of an entire class of AI product failures that don't show up in your eval suite. The model answers the question. The latency is fine. The JSON validates. But users stop trusting the product after three sessions, and the weekly active usage curve never recovers.

The missing discipline is conversation design. And it shapes output quality in ways that most engineering instrumentation is architecturally blind to.

The Accessibility Gap in AI Interfaces Nobody Is Shipping Around

· 8 min read
Tian Pan
Software Engineer

Most AI teams run accessibility audits on their landing pages. Almost none run them on the chat interface itself. The gap isn't laziness — it's that the tools don't exist. WCAG 2.2 has no success criterion for streaming content, no standard for non-deterministic outputs, and no guidance for token-by-token delivery. Which means every AI product streaming responses into a <div> right now is operating in a compliance grey zone while breaking the experience for a significant portion of its users.

This isn't a minor edge case. Blind and low-vision users report information-seeking as their top AI use case. Users with dyslexia, ADHD, and cognitive disabilities are actively trying to use AI tools to reduce reading load — and the default implementation pattern actively makes things worse for them.

The Silent Regression: How to Communicate AI Behavioral Changes Without Losing User Trust

· 9 min read
Tian Pan
Software Engineer

Your power users are your canaries. When you ship a new model version or update a system prompt, aggregate evaluation metrics tick upward — task completion rates improve, hallucination scores drop, A/B tests declare victory. Then your most sophisticated users start filing bug reports. "It used to just do X. Now it lectures me first." "The formatting changed and broke my downstream parser." "I can't get it to stay in character anymore." They aren't imagining things. You shipped a regression, you just didn't see it in your dashboards.

This is the central paradox of AI product development: the users most harmed by behavioral drift are the ones who invested most in understanding the system's quirks. They built workflows around specific output patterns. They learned which prompts reliably triggered which behaviors. When you change the model, you don't just ship updates — you silently invalidate months of their calibration work.

AI User Research: What Users Actually Need Before You Write the First Prompt

· 10 min read
Tian Pan
Software Engineer

Most teams decide they're building an AI feature, then ask users: "Would you want this?" Users say yes. The feature ships. Three months later, weekly active usage is at 12% and plateauing. The postmortem blames implementation or adoption, but the real failure happened before a single line of code was written — in the user research phase that felt thorough but was methodologically broken.

The core problem: users cannot accurately predict their preferences for capabilities they have never experienced. This isn't a minor wrinkle. A study on AI writing assistance found that systems designed from users' stated preferences achieved only 57.7% accuracy — actually underperforming naive baselines that ignored user-stated preferences entirely. You can do a user research sprint that runs for weeks, collect extensive qualitative feedback, and end up with a product nobody uses — not despite the research, but partly because of how it was conducted.

The Enterprise AI Capability Discovery Problem

· 10 min read
Tian Pan
Software Engineer

You shipped the AI feature. You put it in the product. You wrote the help doc. And still, six months later, your most sophisticated enterprise users are copy-pasting text into ChatGPT to do the same thing your feature already does natively. This is not a training problem. It is a discoverability problem, and it is one of the most consistent sources of wasted AI investment in enterprise software today.

The pattern is well-documented: 49% of workers report they never use AI in their role, and 74% of companies struggle to scale value from AI deployments. But the interesting failure mode is not the late-adopters who explicitly resist. It is the engaged users who open your product every day, never knowing that the AI capability they would have paid for is sitting one click away from where their cursor already is.