Skip to main content

4 posts tagged with "retention"

View all tags

Your Eval Set Only Has Problems You Already Solved

· 9 min read
Tian Pan
Software Engineer

Your eval score went from 0.81 to 0.87 over the last quarter. The team shipped a router, swapped in a stronger model on the hard intents, tuned the system prompt, and added forty new test cases harvested from "tickets that took more than a day to close." The dashboard says you got better. NPS is flat. Active users are down two percent.

There is a clean story that explains both numbers, and you don't want to hear it. Your eval set only contains problems you already solved. The queries that failed so badly the user never filed a ticket, never came back, and never showed up in any log you grep — those are not in your suite. They are not in anyone's suite. A rising eval score is consistent with getting better at the things you can see, and it is also consistent with getting better at the things you can see while staying exactly as bad at the things you cannot.

The First-Time User Cliff Your Aggregate Metrics Are Hiding

· 10 min read
Tian Pan
Software Engineer

Your AI feature looks healthy. Weekly active is flat-to-up, satisfaction scores are positive, the dashboard says ship more of this. The PM cites the metric in the next planning round. The engineering lead nods. The roadmap gets another adjacent feature.

Then someone segments the chart by user tenure and the picture inverts. Long-time users — the ones who were already there when the feature shipped — go deep on it daily. First-time users bounce within two interactions. The "flat" line is two cohorts cancelling each other out: a power curve sloping up, and a churn curve sloping down, summed into a lie.

The 14-Month Half-Life of Your Prompt Expert

· 9 min read
Tian Pan
Software Engineer

Every company shipping AI features in production has one or two engineers it cannot afford to lose, and most of them do not know who those engineers are until the resignation email arrives.

The person in question is rarely the loudest in the room. They are the one who remembers that the customer-support summarizer's tone got fixed by a three-line system-prompt edit after the Q2 escalation, that the eval suite added six cases the week the model provider quietly changed its default sampling, and that the judge calibration drifted the last time someone "cleaned up" the rubric. None of this is written down in a place a successor would find. It lives in one head, and that head is being messaged by a recruiter with a 25% raise attached roughly every two weeks.

AI Feature PMF Signals: Why Your Metrics Are Lying to You

· 9 min read
Tian Pan
Software Engineer

When your AI feature ships and the metrics light up — DAU spikes, NPS climbs, thumbs-up feedback floods in — you could be looking at genuine product-market fit. Or you could be watching the first act of a two-part story where the second act ends with a retention cliff nobody saw coming.

The problem is these signals are structurally broken for probabilistic AI features. They were designed for deterministic software where "activated" means something, where a five-star rating predicts future use, where the novelty fades in days rather than masking a six-month churn wave. AI features behave differently, and the standard PMF toolkit is calibrated for the wrong inputs.