Skip to main content

The Cold Start Problem in AI Personalization: Being Useful Before You Have Data

· 11 min read
Tian Pan
Software Engineer

Most personalization systems are built around a flywheel: users interact, you learn their preferences, you show better recommendations, they interact more. The flywheel spins faster as data accumulates. The problem is the flywheel needs velocity to generate lift — and a new user has none.

This is the cold start problem. And it's more dangerous than most teams recognize when they first ship personalization. A new user arrives with no history, no signal, and often a skeptical prior: "AI doesn't know me." You have roughly 5–15 minutes to prove otherwise before they form an opinion that determines whether they'll stay long enough to generate the data that would let you actually help them. Up to 75% of new users abandon products in the first week if that window goes badly.

The cold start problem isn't a data problem. It's an initialization problem. The engineering question is: what do you inject in place of history?

Why Standard Collaborative Filtering Fails Here

Collaborative filtering — the bedrock of most recommendation systems — works by finding users whose past behavior resembles yours and assuming you'll like what they liked. It's a powerful technique once you have behavioral data. The problem is structural: it requires an existing interaction matrix to find similar users. A new user is an empty row.

Popularity-based fallbacks exist for this case: show the new user whatever most users engage with. This works as a baseline, but it destroys the perception of personalization entirely. A new user who sees "trending" recommendations doesn't feel understood — they feel handled. And for LLM-powered products, the gap is worse. An LLM assistant that defaults to generic responses because it has no context about the user reads as dumb, not neutral.

The failure compounds because cold start is also when user expectations are highest. Users just signed up; they're open to the product. The first session is when you either earn their trust or validate the suspicion that AI products are just fancy pattern matching.

Onboarding Signal Capture: Implicit Versus Explicit

The fastest way to bootstrap personalization is to ask directly. But ask wrong and you lose users before they see your product.

Explicit signals — preference surveys, onboarding questionnaires, pairwise comparisons — give you direct information. Research on preference elicitation design has converged on a few concrete findings:

  • Ask about attributes, not items. "What cuisines do you prefer?" outperforms "Rate these five restaurants." Attribute questions generalize; item-specific questions just tell you the user likes a specific thing.
  • Use pairwise comparisons over numeric scales. Asking "Which of these two do you prefer?" extracts more signal per question than asking users to rate items individually. Numeric rating scales also correlate with low enjoyment — they feel like homework.
  • 5–8 questions maximum. Above that threshold, completion rates drop sharply. Design each question to maximize information gain, not coverage.
  • Progressive disclosure beats front-loading. Showing all onboarding questions at once causes decision fatigue. Branching question trees — where each answer informs which question comes next — reduce cognitive load while actually capturing more signal from the users who complete them.

Implicit signals — clicks, dwell time, scroll depth, navigation patterns — cost the user nothing but require initial interactions to generate. The first session is almost entirely implicit, so your onboarding design needs to create structured opportunities to observe behavior: showing a small grid of options and watching what gets explored, presenting alternatives and tracking which gets expanded.

The best onboarding systems combine both. Explicit questions during signup seed an initial model; implicit signals from the first session refine it immediately. By the time the user finishes their first interaction, the system knows meaningfully more than when they arrived.

Cohort-Prior Injection: Borrowing from Population History

You don't have data on this specific user, but you have data on thousands of users who behaved similarly in the first few minutes. Cohort-prior injection is the technique of using that population data to initialize a new user's personalization model.

The Bayesian framing is useful here: you're building a prior for the new user based on the population distribution, then updating that prior as the user generates observations. Start with a generic prior, but parameterize it by the signals you have available: geographic location, referral source, device type, signup time, explicit attribute preferences from onboarding. Each signal narrows the relevant cohort.

A few implementation patterns:

Segment-conditioned defaults. Rather than one "cold start" default experience, maintain 10–20 segment profiles built from historical user clusters. Route new users to the most relevant segment based on available signals and use that segment's behavior distribution as the initialization. The specific cohort matters — a new user whose referral source is a cooking subreddit should start with different priors than one who arrived from a productivity newsletter.

Trending-within-segment vs. global trending. The cold start fallback of "show popular content" can be significantly improved by scoping "popular" to the right cohort. Content that's trending among users who look like this new user is far more useful than global trending, even with incomplete similarity information.

Exploration-exploitation balance (bandits). Contextual bandit models are well-suited for early personalization because they explicitly balance exploration — learning what this user responds to — with exploitation — showing things the user is likely to engage with. DoorDash used this for cuisine ranking for new users: the bandit model explores the user's preferences while still showing cuisines that similar users have historically engaged with, preventing the blank-slate paralysis of pure exploration.

The limitation of cohort priors is coverage. If your population doesn't include users similar to the new one, the prior adds noise rather than signal. Cohort priors are most reliable in the dense center of your user distribution; they degrade for genuinely novel user types.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates