The feature flag and progressive delivery market is projected to hit $5 billion by 2028, up from roughly $1.5 billion in 2024. If that growth trajectory doesn’t get your attention, the vendor landscape should: LaunchDarkly (which raised at a $3B valuation), Split.io, Flagsmith, Unleash, and DevCycle are all fighting for a market that barely existed five years ago. This isn’t a niche tool category anymore — it’s core infrastructure.
I’ve watched progressive delivery evolve through four distinct generations, and each one made the previous look primitive:
- Simple on/off flags — the toggle switch era. Ship the code, hide it behind a flag, flip it on when you’re ready.
- Percentage rollouts — roll out to 5% of users, then 25%, then 100%. Basic but effective.
- Targeted rollouts — target by user segment, geography, plan tier, or custom attributes. This is where most teams are today.
- AI-powered rollouts — ML models monitor key metrics (error rates, latency, conversion, engagement) during rollout and automatically halt or roll back if anomalies are detected. No human watching dashboards at 2 AM.
Here’s the uncomfortable stat: 96% of high-growth companies have invested in experimentation platforms, according to recent industry surveys. But most engineering teams still do binary deployments — ship it to everyone, or ship it to nobody. The investment is there but the adoption is shallow.
What AI-Powered Flags Actually Look Like
The latest generation of flag platforms integrate ML-based anomaly detection directly into the rollout pipeline. When you start a progressive rollout, the system establishes baseline metrics during the initial cohort (say, 1% of traffic). As the rollout expands, the model continuously compares the treatment group against the baseline. If error rates spike, if p99 latency degrades beyond a threshold, if conversion drops outside the expected confidence interval — the system automatically halts the rollout and alerts the team.
No one needs to be watching a Grafana dashboard. The system watches for you.
Edge Evaluation Changes Everything
Modern flag SDKs have moved to edge evaluation — flag decisions happen at the CDN edge in under 1 millisecond without calling a central server. This means you can do per-request targeting at massive scale without adding latency. LaunchDarkly’s edge SDK, DevCycle’s EdgeDB, and Unleash’s edge proxy all enable this. The performance concern that used to hold teams back (“won’t all these flag checks slow down my app?”) is essentially eliminated.
Our Experience
My team implemented progressive delivery with LaunchDarkly about 18 months ago. Results:
- 65% reduction in production incidents from new feature releases
- New features roll out: 1% → 5% → 25% → 100% over 48 hours
- Automatic anomaly monitoring on error rate, latency, and three business metrics per feature
- Average time to detect a bad rollout dropped from 4 hours (human-detected) to 8 minutes (ML-detected)
The Hard Part Isn’t Technical
The biggest challenge with progressive delivery is organizational, not technical. It requires product teams to define “success metrics” BEFORE shipping. You can’t set up anomaly detection if you haven’t defined what normal looks like. Most teams define metrics after launch — “let’s see how it does and then figure out what to measure.” Progressive delivery forces the conversation upfront.
This cultural shift is genuinely harder than the technical implementation. Getting a PM to articulate “this feature is successful if conversion on the checkout flow doesn’t drop by more than 0.5% and error rate stays below 0.1%” before writing code? That’s a change management problem.
The Flag Debt Problem
I’ll be honest about the dark side: feature flags that are never cleaned up become technical debt. My team currently has 340 flags in our system. I’d estimate 200 of them are stale — features that fully rolled out months ago but nobody removed the flag. The code is littered with conditional branches that will never evaluate to false. It’s a real problem and I don’t have a great solution beyond discipline.
Question for the community: How mature is your team’s progressive delivery practice? Are you still doing binary deploys, or have you moved to staged rollouts? And if you’re using AI-powered anomaly detection, how’s it working in practice?