I’m wrestling with something that’s probably familiar to anyone who’s scaled an engineering organization: how do you know if your incident culture is actually psychologically safe?
Context: Scaling Reveals Culture Gaps
When we were a team of 25 engineers, I felt like I had a good sense of our incident culture. I was in most incident channels, attended most postmortems, talked to people regularly. It felt healthy.
Now we’re 80+ engineers across multiple teams. And I’m noticing something: some teams are incredibly open in their postmortems - sharing detailed insights, acknowledging mistakes, proposing ambitious improvements. Other teams produce postmortems that feel sanitized, surface-level, defensive.
Same company. Same policies. Same postmortem template. Different cultures.
The Measurement Question
My instinct is: if I can’t measure it, I can’t improve it. So I started thinking about metrics for psychological safety during incidents:
Potential Metrics I’ve Considered:
-
Participation rates in postmortem discussions
- How many people contribute to the discussion?
- Are ICs speaking up or just managers?
- Theory: Higher participation = more safety
-
Number of contributing factors identified
- Are we finding 2-3 factors or 10-15?
- Theory: More factors = deeper system thinking, less fear
-
Anonymous feedback scores
- Post-incident survey: “I felt safe sharing my perspective” (1-5)
- “I believe we’ll learn from this incident” (1-5)
- Theory: Direct measurement of safety perception
-
Retention rates of incident participants
- Do engineers who lead incident response stay or leave?
- Theory: If people leave after incidents, safety is low
-
System improvements per incident
- How many actual system changes result from postmortems?
- Theory: Real improvements indicate the process is valuable, not just theater
-
Time to postmortem completion
- How quickly do teams complete postmortems?
- Theory: Faster = more eager to learn (vs dragging feet to avoid shame)
The Fundamental Tension
But here’s what I’m struggling with: the act of measuring psychological safety might undermine the safety you’re trying to create.
If engineers know their “participation rate” is being tracked, does that change how they participate? Does it create performance anxiety around vulnerability?
If we survey people about safety, does that make them more conscious of being evaluated, which decreases safety?
There’s a observer effect here. Measurement changes the thing being measured, especially when it comes to psychological dynamics.
What I Actually Care About
Stepping back, what I really want to know is:
- Are people being honest in postmortems?
- Are we learning effectively from incidents?
- Do engineers feel supported when they’re involved in incidents?
- Is incident culture strengthening our teams or damaging them?
Maybe those questions aren’t directly measurable. Maybe they require qualitative assessment, not metrics.
What We’re Doing Now (Imperfectly)
Currently, I’m trying a few approaches:
-
Manager Check-ins: I ask my directs (engineering managers) to do 1-on-1s with anyone involved in major incidents. Not about the incident itself, but about how the process felt.
-
Postmortem Review: I randomly sample postmortems from different teams and look for patterns. Are they rich with insights or sparse? Do they show curiosity or defensiveness?
-
Exit Interview Analysis: When engineers leave, we specifically ask about incident culture. If multiple people mention it, that’s a signal.
-
Informal Observation: I still try to be in incident channels and postmortem meetings. Not to evaluate, but to feel the temperature of the culture.
None of these are proper metrics. They’re more like… qualitative sensing? Which feels unsatisfying to my data-driven brain, but might be more appropriate for this kind of human dynamic.
Questions for the Community
I’m curious how others approach this:
- Do you measure psychological safety in incident response? How?
- Have you found metrics that actually work without creating perverse incentives?
- Is measurement the wrong approach entirely?
- How do you know if your incident culture is healthy as you scale?
I want to get better at this. Our engineering culture is one of our competitive advantages, and incident response is where culture shows up most visibly. But I’m not sure if measuring it is the right move or if it’s something that requires a different kind of attention.
What’s worked for you?