Skip to main content

3 posts tagged with "rollout"

View all tags

The Canary Cohort Your Rollout Hashed by ID That Clustered Power Users Into One Arm

· 10 min read
Tian Pan
Software Engineer

A rollout team ships a new model behind a percentage flag. The flag bucket is computed as hash(user_id) % 100, the canary is buckets 0–4, the lift on per-user engagement is large and stable for two weeks, and the team ramps to 20%, then 50%, then global. The lift evaporates somewhere between 50% and global, and the post-mortem traces it back to the canary cohort. The treatment didn't move the metric. The canary arm was a different population.

The team thought it had been sampling users. It had been sampling IDs.

The Agent Rollout Cadence Your Customer Success Team Could Not Absorb

· 11 min read
Tian Pan
Software Engineer

The customer pasted the agent's answer into a support chat and asked the human rep to confirm it. The rep, looking at the same product, said the opposite. The customer did not lose trust in the agent that day. They lost trust in the company, because two parts of it told them two different things in the same hour.

Nothing was broken. The AI team had shipped a prompt change on Tuesday behind a feature flag, ramped it to 100% by Thursday, and moved on. The customer success team's enablement cycle is monthly — that is how every other product feature has always landed, and nobody re-negotiated the contract for AI. The macro in the CS rep's queue and the FAQ doc on the public site still described the previous behavior. The agent was correct. The rep was correct against the documentation they had. The company was incoherent.

Your AI Feature Ramp Is Rolling Out on the Wrong Axis

· 11 min read
Tian Pan
Software Engineer

A team I talked to last month ramped a new agentic feature from 1% to 50% of users over four weeks. Aggregate quality metrics held within noise. Latency stayed within SLA. They were preparing the 100% memo when the support queue caught fire — a customer with a six-tool research workflow had been getting silently corrupted outputs since the 10% step. The hard queries had been there the whole time, evenly sprinkled across every cohort, averaging into the noise floor. Nobody saw them until a single high-volume user happened to hit them at scale.

This is not a monitoring failure. It is a ramp-axis failure. Feature flag tooling — the entire LaunchDarkly / Flagsmith / Unleash / Cloudflare-Flagship category — assumes blast radius scales with the number of humans exposed. For deterministic software that is mostly true: a NullPointerException hits everyone or nobody, and showing it to 1% of users limits the user-visible blast to 1%. For AI features, blast radius does not scale on the human axis. It scales on the input axis. And the input axis is where almost no one is ramping.