I spent the last quarter analyzing our engineering metrics across three product teams, and one pattern jumped out immediately:
AI tool adoption for coding: 89%
AI tool adoption for deployment: 4%
Four percent.
We trust AI to write our code. We don’t trust it to ship that code to production.
And I think that tells us something critical about where we are in the AI productivity journey - and what’s broken in how we’re measuring it.
The Trust Paradox
Here’s the paradox that should concern every engineering leader:
If AI is making us more productive at writing code, why aren’t we using it for the most time-consuming, error-prone part of the delivery cycle - deployment and release management?
The industry data is stark:
- 76% of developers don’t use AI for deployment
- 69% skip it for planning
- 59% report deployment problems at least half the time when using AI tools
(source)
We’re using AI for the creative part (writing code) but not for the operational part (delivering code). Why?
My Hypothesis: Deployment Requires Understanding of State
After talking to dozens of engineers on our teams, here’s what I believe:
Code generation is stateless. Deployment is stateful.
AI can write a function because:
- The inputs are clear
- The outputs are defined
- The logic is self-contained
- The context is in the prompt
AI struggles with deployment because:
- System state is complex and distributed
- Dependencies are implicit and historical
- Timing matters (can’t deploy during peak traffic)
- Rollback requires understanding of data migration state
- Impact requires domain knowledge (is this breaking? will customers notice?)
You can prompt AI to write code. You can’t prompt it to understand your production environment.
The Incident Rate That Explains Everything
Here’s the number that crystallizes why we don’t trust AI for deployment:
22% of deployments from developers who heavily use AI tools result in a rollback, hotfix, or customer incident.
(source)
That’s one in five deployments failing.
Now imagine if we let AI decide when, how, and what to deploy. That 22% failure rate might be closer to 50%.
Because the failures aren’t random - they’re failures of context understanding. And that’s exactly what AI lacks.
The Visibility Problem We’re Not Solving
The strategic question for CTOs and VPs of Engineering isn’t “Should we use AI for deployment?” - it’s:
“How do we build systems that can track AI-generated code through the entire delivery cycle and understand its production impact?”
Most engineering tools give you fragmented visibility:
- GitHub shows what code was written (but not if AI helped)
- CircleCI shows what builds passed (but not why they failed)
- Datadog shows production errors (but not which originated from AI code)
- LaunchDarkly shows feature flags (but not the risk profile of what’s behind them)
What’s missing: The connective tissue that links:
- Code origin (AI vs. human)
- Review quality (thorough vs. rubber-stamped)
- Test coverage (comprehensive vs. basic)
- Deployment success (clean vs. rolled back)
- Production health (stable vs. incidents)
Without that visibility, we’re flying blind.
Main Branch Success Rate: The Early Warning Signal
There’s one metric that’s emerged as the clearest predictor of AI code problems: main branch success rate.
Industry benchmark: 90%
Average for teams with high AI adoption: 70.8%
(source)
That 20-point gap represents:
- Failed builds that need diagnosis
- Reverted commits that waste CI/CD cycles
- Hotfixes that bypass your quality gates
- Merge conflicts that slow down the team
When your main branch success rate drops below 80%, it doesn’t matter how fast developers can write code - your delivery pipeline is the bottleneck.
And AI code is making it worse, not better.
What Would Make Us Trust AI for Deployment?
I’ve been asking our teams: “What would it take for you to let AI handle deployment?”
The answers cluster around explainability and audit trails:
For AI to deploy, engineers want:
- Explainable deployment plans - “I’m deploying X because Y, and here’s my rollback plan if Z”
- Risk assessment - “This deployment touches payment processing (high risk) vs. UI copy (low risk)”
- State awareness - “Migration #47 already ran in prod, so skip it”
- Timing intelligence - “It’s 3pm PST, peak traffic time, wait until 8pm”
- Audit trails - “I can explain to an auditor why this deployed when it did”
AI can’t do any of this today.
Not because the AI isn’t smart enough. Because our deployment systems don’t capture the context AI would need.
The Infrastructure We Need to Build
If we want AI to help with deployment (and we should - deployment is tedious, error-prone, and burns out on-call engineers), we need:
1. Deployment Intelligence Platforms
- Track every deployment: who, what, when, why
- Correlate deployments with production metrics
- Learn patterns: “Deployments on Fridays have 2× rollback rate”
2. Risk Scoring Systems
- Automatically assess: What’s being deployed?
- Code complexity, test coverage, review thoroughness
- Production blast radius, customer impact
- Score 0-100: “This is a safe deploy” vs. “This is risky”
3. Context-Aware CI/CD
- Not just “can this deploy?” but “should this deploy now?”
- Time of day, system load, recent incidents, team availability
- Block deployments that are technically valid but operationally unwise
4. AI Code Lineage Tracking
- Tag AI-generated code at commit time
- Track it through review, testing, deployment
- Measure: AI code success rate vs. human code success rate in production
With that infrastructure, then we could consider AI-assisted deployment.
Without it, we’re just asking for more incidents.
The Cultural Shift Required
Here’s the uncomfortable truth:
If AI deployment fails, someone gets paged at 2am.
That’s why engineers don’t use it. The downside isn’t “this code doesn’t work” - it’s “our customers are down and I have to fix it.”
When AI writes code that breaks in development, it’s an inconvenience.
When AI deploys code that breaks in production, it’s a career-limiting incident.
The trust gap isn’t about AI capabilities. It’s about accountability.
Until we solve “who’s responsible when AI-deployed code fails?”, adoption will stay at 4%.
The Path Forward
I’m not arguing we should use AI for deployment today. I’m arguing we should build the infrastructure that makes AI deployment possible tomorrow.
Because the deployment process is broken even without AI:
- Too manual
- Too error-prone
- Too dependent on tribal knowledge
- Too stressful for on-call engineers
AI could help - but only if we redesign deployment for the AI era.
That means:
- Capturing context that’s currently in people’s heads
- Making deployment decisions explainable and auditable
- Building risk assessment into the pipeline
- Creating feedback loops from production back to code review
The 76% who don’t use AI for deployment aren’t wrong. They’re being rational.
The question is: What do we need to build to make AI deployment rational?
What would it take for you to trust AI with deployment? Or is this a human-only domain forever?