Skip to main content

The Cold Start Problem in AI Personalization: Being Useful Before You Have Data

· 11 min read
Tian Pan
Software Engineer

Most personalization systems are built around a flywheel: users interact, you learn their preferences, you show better recommendations, they interact more. The flywheel spins faster as data accumulates. The problem is the flywheel needs velocity to generate lift — and a new user has none.

This is the cold start problem. And it's more dangerous than most teams recognize when they first ship personalization. A new user arrives with no history, no signal, and often a skeptical prior: "AI doesn't know me." You have roughly 5–15 minutes to prove otherwise before they form an opinion that determines whether they'll stay long enough to generate the data that would let you actually help them. Up to 75% of new users abandon products in the first week if that window goes badly.

The cold start problem isn't a data problem. It's an initialization problem. The engineering question is: what do you inject in place of history?

Why Standard Collaborative Filtering Fails Here

Collaborative filtering — the bedrock of most recommendation systems — works by finding users whose past behavior resembles yours and assuming you'll like what they liked. It's a powerful technique once you have behavioral data. The problem is structural: it requires an existing interaction matrix to find similar users. A new user is an empty row.

Popularity-based fallbacks exist for this case: show the new user whatever most users engage with. This works as a baseline, but it destroys the perception of personalization entirely. A new user who sees "trending" recommendations doesn't feel understood — they feel handled. And for LLM-powered products, the gap is worse. An LLM assistant that defaults to generic responses because it has no context about the user reads as dumb, not neutral.

The failure compounds because cold start is also when user expectations are highest. Users just signed up; they're open to the product. The first session is when you either earn their trust or validate the suspicion that AI products are just fancy pattern matching.

Onboarding Signal Capture: Implicit Versus Explicit

The fastest way to bootstrap personalization is to ask directly. But ask wrong and you lose users before they see your product.

Explicit signals — preference surveys, onboarding questionnaires, pairwise comparisons — give you direct information. Research on preference elicitation design has converged on a few concrete findings:

  • Ask about attributes, not items. "What cuisines do you prefer?" outperforms "Rate these five restaurants." Attribute questions generalize; item-specific questions just tell you the user likes a specific thing.
  • Use pairwise comparisons over numeric scales. Asking "Which of these two do you prefer?" extracts more signal per question than asking users to rate items individually. Numeric rating scales also correlate with low enjoyment — they feel like homework.
  • 5–8 questions maximum. Above that threshold, completion rates drop sharply. Design each question to maximize information gain, not coverage.
  • Progressive disclosure beats front-loading. Showing all onboarding questions at once causes decision fatigue. Branching question trees — where each answer informs which question comes next — reduce cognitive load while actually capturing more signal from the users who complete them.

Implicit signals — clicks, dwell time, scroll depth, navigation patterns — cost the user nothing but require initial interactions to generate. The first session is almost entirely implicit, so your onboarding design needs to create structured opportunities to observe behavior: showing a small grid of options and watching what gets explored, presenting alternatives and tracking which gets expanded.

The best onboarding systems combine both. Explicit questions during signup seed an initial model; implicit signals from the first session refine it immediately. By the time the user finishes their first interaction, the system knows meaningfully more than when they arrived.

Cohort-Prior Injection: Borrowing from Population History

You don't have data on this specific user, but you have data on thousands of users who behaved similarly in the first few minutes. Cohort-prior injection is the technique of using that population data to initialize a new user's personalization model.

The Bayesian framing is useful here: you're building a prior for the new user based on the population distribution, then updating that prior as the user generates observations. Start with a generic prior, but parameterize it by the signals you have available: geographic location, referral source, device type, signup time, explicit attribute preferences from onboarding. Each signal narrows the relevant cohort.

A few implementation patterns:

Segment-conditioned defaults. Rather than one "cold start" default experience, maintain 10–20 segment profiles built from historical user clusters. Route new users to the most relevant segment based on available signals and use that segment's behavior distribution as the initialization. The specific cohort matters — a new user whose referral source is a cooking subreddit should start with different priors than one who arrived from a productivity newsletter.

Trending-within-segment vs. global trending. The cold start fallback of "show popular content" can be significantly improved by scoping "popular" to the right cohort. Content that's trending among users who look like this new user is far more useful than global trending, even with incomplete similarity information.

Exploration-exploitation balance (bandits). Contextual bandit models are well-suited for early personalization because they explicitly balance exploration — learning what this user responds to — with exploitation — showing things the user is likely to engage with. DoorDash used this for cuisine ranking for new users: the bandit model explores the user's preferences while still showing cuisines that similar users have historically engaged with, preventing the blank-slate paralysis of pure exploration.

The limitation of cohort priors is coverage. If your population doesn't include users similar to the new one, the prior adds noise rather than signal. Cohort priors are most reliable in the dense center of your user distribution; they degrade for genuinely novel user types.

Building the Onboarding Loop: Interaction Design Before the Flywheel Spins

The interaction design goal during cold start is to manufacture the first few turns of the flywheel manually. Every design decision in the first session should be evaluated against: "does this generate a signal, or does it consume attention without creating data?"

A few patterns that work:

Micro-commitment chains. Instead of one large preference survey at signup, spread signal capture across the first few interactions as part of normal product use. When the user first searches, capture the query and result selection. When they first browse a category, capture dwell time by item. Each micro-commitment generates data while feeling like product use rather than setup.

Forced exposure sets. Present a small, curated set of representative items — designed to cover the latent preference space — and observe responses. Netflix does a version of this with genre thumbnails at signup. The design principle is to choose the set to maximize information gain: items that are distinctive enough that choosing among them actually reveals something, not items that happen to be popular.

Graceful degradation with visible improvement. Make the cold start state visible and frame it honestly. "We're learning your preferences — here's what we've figured out so far" is more engaging than pretending the system knows you when it doesn't. As the user generates more signals, explicitly acknowledge the improvement. This creates a meta-game where the user understands that more interaction produces better recommendations, which increases interaction.

LLM onboarding conversations. For LLM-powered products, the onboarding questionnaire can be replaced by a short structured conversation. An LLM can extract preference signals from natural language that a form can't, and users tend to find conversational onboarding lower-friction than forms. The output is the same — a structured preference representation — but the input feels less like homework. Extract attributes explicitly from the conversation and store them in a structured user profile rather than relying on the LLM to remember a previous conversation.

LLM-Specific Cold Start Patterns

LLMs reopen several previously closed doors for cold start personalization.

Zero-shot ranking. Standard collaborative filtering can't operate without interaction history, but an LLM can rank items from pure textual descriptions without domain-specific training. Given a set of candidate items and a textual description of the user's stated preferences (from onboarding), an LLM can produce a reasonable initial ranking that's substantially better than global popularity. Research in 2024–2025 shows that zero-shot LLM rankers rival specialized recommendation models in data-sparse conditions, and they degrade gracefully rather than failing completely.

Simulated interaction data. LLMs can generate synthetic user interactions for new items — predicting how a described user would engage with a described item based on general world knowledge. This is particularly useful for solving the item cold start problem (new items in the catalog that have no interaction history) alongside the user cold start problem. The synthetic data bootstraps the embedding space enough to make initial collaborative filtering tractable.

Preference extraction from conversation. LLMs can infer implicit preferences from freeform text that structured elicitation misses. A user who mentions "I hate the gym but want to get into shape" reveals preferences about format, motivation, and pain points that no checkbox survey captures. Building a preference extraction step early in the user journey — a short freeform question rather than a structured form — can generate richer initialization data than explicit elicitation, and it doubles as a signal about the user's communication style.

The constraints to engineer around: LLM-based cold start is more expensive at inference time than lookup-based recommendations, context windows limit how much preference history you can inject, and LLM outputs are stochastic which creates consistency challenges when you want deterministic initial rankings.

What Teams Get Wrong

The most common failure mode is treating cold start as a temporary condition that will self-resolve once users interact more. The logic seems right — more interaction generates more data, which solves cold start — but it ignores that users who have a poor first session may never generate the second and third sessions needed to escape cold start. The initialization state persists until it's actively resolved.

The second failure mode is optimizing the wrong metric during cold start. Teams often measure click-through rate on cold-start recommendations, but CTR in the first session measures whether the user found the content interesting, not whether it was personalized to them. The right early-session metric is exploration coverage: did the user interact with multiple categories, or did they bounce after seeing the first result that happened to match a popular segment?

Third: front-loading the data collection at the expense of the product experience. An onboarding flow that asks seven preference questions before showing any product value treats data collection as the product. Users tolerate this for a few questions, then they abandon. The best onboarding interleaves signal capture with immediate value delivery — show something useful, then ask a question about it.

The Actual Flywheel Mechanism

The personalization flywheel isn't automatic. It's a designed system with four deliberate stages:

  1. Entice: the initial cold-start experience delivers enough value to create a second session
  2. Enable: the product creates structured opportunities for the user to express preferences (explicit and implicit)
  3. Capture: those expressions are converted into a user model that persists and accumulates
  4. Personalize: the model drives recommendations that are noticeably better than the cold-start baseline

The gap that kills most personalization launches is between stages 1 and 2. Teams build a personalization engine that works well in stage 4 but invest too little in stage 1. The cold start experience is an afterthought, the second session rate is low, and there's never enough data to make the later stages work.

Getting stage 1 right is fundamentally a user experience problem, not a machine learning problem. The ML system can only optimize over the users who stuck around long enough to generate data. The decision about who those users are is made entirely in the first session.

The teams that solve cold start well treat it as a first-class product problem: dedicated instrumentation to measure first-session quality, explicit A/B testing of onboarding flows, and a separate metric tracking for the transition from cold to warm state (typically defined as when the user-specific model outperforms the cohort prior by a measurable margin). Cold start isn't the beginning of personalization — it's the foundation everything else is built on.

References:Let's stay in touch and Follow me for more thoughts and updates