Three years ago, I sat in a conference room watching what was supposed to be our first “blameless” postmortem. On paper, we had all the right policies. Our incident response documentation literally said “blameless culture” in bold. But as the meeting progressed, I watched our VP of Engineering ask increasingly pointed questions: “Why didn’t you check that before deploying?” “Shouldn’t you have known that would cause problems?”
The engineer who caused the incident - a brilliant senior developer - became quieter and quieter. By the end, everyone knew whose fault it was, even though no one said it explicitly. That engineer left the company six months later.
The Blameless Policy-Practice Gap
Since then, I’ve led engineering teams at three different companies. I’ve seen this pattern repeat: organizations adopt “blameless postmortem” policies, create templates, maybe even get training. But the actual culture? Still blame-focused.
Here’s what I’ve learned about why this gap exists:
1. Language Undermines Blamelessness
We say “blameless” but our language betrays us. Terms like “root cause” implicitly point to a singular failure point - usually a person. When we ask “Who deployed this?” instead of “What deployment process allowed this?”, we’re centering blame on individuals rather than systems.
At my current company, we’ve shifted to “contributing factors” instead of “root causes.” It sounds subtle, but it changes how people think. Instead of finding the one thing that broke, we explore the multiple system conditions that enabled the failure.
2. Leadership Behavior Sets The Tone
The most important factor isn’t your policy document - it’s how leadership behaves during and after incidents. I’ve seen CTOs who preach blamelessness but then ask “How did this get through code review?” in a tone that clearly assigns fault.
Leaders need to model the behavior. When I’m in a postmortem now, I deliberately share times I’ve made similar mistakes. I talk about architectural decisions I made that caused incidents. It signals: we’re all learning here, including me.
3. Career Impact Creates Fear
Here’s the uncomfortable truth: even in “blameless” cultures, being associated with a major incident impacts your reputation. It might not show up in your performance review, but engineers remember. Peers make comments. You become “the person who took down production.”
This fear prevents honest sharing. Engineers minimize their role, focus on external factors, or worst case, hide information. The postmortem might be blameless, but the social dynamics aren’t.
4. Metrics Can Create Perverse Incentives
Some companies track incident metrics in ways that inadvertently encourage blame. “Incidents per team” or “time to resolution by person” might seem like good accountability measures, but they make incidents feel like individual failures rather than learning opportunities.
I’ve started tracking different metrics: “System improvements per postmortem” and “Repeat incident rate.” These focus on learning outcomes, not individual performance.
Making Blamelessness Real
So what actually works? Here’s what I’ve seen succeed:
-
Remove “who” from templates entirely. Our postmortem template doesn’t have a field for “person responsible.” It has “system conditions that enabled this failure.”
-
Celebrate vulnerability. We have a monthly “lessons learned” session where people share their mistakes. When senior engineers and leaders participate authentically, it normalizes failure as learning.
-
Separate incident response from performance reviews. Our engineers know that being involved in an incident - even causing one - is explicitly excluded from performance evaluation. This is written policy, communicated clearly.
-
Focus on enabling vs preventing. Instead of asking “How do we prevent people from making this mistake?”, we ask “How do we make it impossible for this mistake to cause an outage?” Better guardrails, better testing, better observability.
Questions For The Community
I’m curious how others have tackled this:
- How do you ensure blamelessness is real, not just rhetoric?
- What signals indicate your incident culture is actually psychologically safe?
- Have you seen organizations successfully shift from blame to learning? What enabled that change?
The gap between policy and practice is real. But I believe we can close it - it just takes intention, leadership modeling, and system-level thinking about culture, not just technology.
What’s your experience with this? Are your “blameless” postmortems actually blameless?