Skip to main content

42 posts tagged with "ux"

View all tags

TTFT Is the Only Latency Metric Your Users Actually Feel

· 9 min read
Tian Pan
Software Engineer

Your model generates a 500-word response in 8 seconds. A competing model generates the same response in 12 seconds. Intuitively, yours should feel faster. But if your first token arrives at 2.5 seconds and theirs arrives at 400 milliseconds, your users will describe your product as slow — regardless of total generation time. This is the central paradox of LLM latency: the metric your infrastructure team optimizes for (end-to-end generation time, tokens per second) is not the metric your users experience. Time-to-first-token is.

TTFT is not a detail. It is the primary signal users use to judge whether your AI feature is responsive. Getting it wrong means building fast systems that feel slow.

Token Budget as a Product Constraint: Designing Around Context Limits Instead of Pretending They Don't Exist

· 10 min read
Tian Pan
Software Engineer

Most AI products treat the context limit as an implementation detail to hide from users. That decision looks clean in demos and catastrophic in production. When a user hits the limit mid-task, one of three things happens: the request throws a hard error, the model silently starts hallucinating because critical earlier context was dropped, or the product resets the session and destroys all accumulated state. None of these are acceptable outcomes for a product you're asking people to trust with real work.

The token budget isn't a quirk to paper over. It's a first-class product constraint that belongs in your design process the same way memory limits belong in systems programming. The teams that ship reliable AI features have stopped pretending the ceiling doesn't exist.

Ambient AI Design: When the Chat Interface Is the Wrong Abstraction

· 8 min read
Tian Pan
Software Engineer

Most engineering teams default to building AI features as chat interfaces. A user types something; the model responds. The pattern feels natural because it maps to human conversation, and the tooling makes it easy. But when you watch those chat-based AI features in production, you often see the same dysfunction: the UI sits idle, waiting for a user who is too busy, too distracted, or simply unaware that they should be asking something.

Chat is a pull model. The user initiates. The AI reacts. For a meaningful subset of the valuable AI work in any product—monitoring, anomaly detection, workflow automation, proactive notification—pull is the wrong shape. The work needs to happen whether or not the user remembered to open the chat window.

The Trust Calibration Gap: Why AI Features Get Ignored or Blindly Followed

· 9 min read
Tian Pan
Software Engineer

You shipped an AI feature. The model is good — you measured it. Precision is 91%, recall is solid, the P99 latency is under 400ms. Three months later, product analytics tell a grim story: power users have turned it off entirely, while a different cohort is accepting every suggestion without changing a word, including the ones that are clearly wrong.

This is the trust calibration gap. It's not a model problem. It's a design problem — and it's more common than most AI product teams admit.

The Trust Calibration Curve: How Users Learn to (Mis)Trust AI

· 9 min read
Tian Pan
Software Engineer

Most AI products die the same way. The demo works. The beta users rave. You ship. And then, about three months in, session length drops, the feature sits idle, and your most engaged early users start routing around the AI to use the underlying tool directly.

It's not a model quality problem. It's a trust calibration problem.

The over-trust → failure → over-correction lifecycle is the most reliable killer of AI product adoption, and it's almost entirely preventable if you understand what's actually happening. The research is clear, the failure modes are predictable, and the design patterns exist. Most teams ignore all of it until they're looking at the retention curve and wondering what went wrong.

The Accuracy Threshold Problem: When Your AI Feature Is Too Good to Ignore and Too Bad to Trust

· 10 min read
Tian Pan
Software Engineer

McDonald's deployed its AI voice ordering system to over 100 locations. In testing, it hit accuracy numbers that seemed workable — low-to-mid 80s percent. Customers started posting videos of the system adding nine sweet teas to their order unprompted, placing bacon on ice cream, and confidently mishearing simple requests. Within two years, the partnership was dissolved and the technology removed from every location. The lab accuracy was real. The real-world distribution was not what the lab tested.

This is the accuracy threshold problem. There is a zone — roughly 70 to 85 percent accuracy — where an AI feature is precise enough to look like it works, but not reliable enough to actually work without continuous human intervention. Teams ship into this zone because the numbers feel close enough. Users get confused because the feature is just good enough to lure them into reliance and just bad enough to fail when it matters.