How Do You Measure Psychological Safety During Incidents? (Or Should We?)

I’m wrestling with something that’s probably familiar to anyone who’s scaled an engineering organization: how do you know if your incident culture is actually psychologically safe?

Context: Scaling Reveals Culture Gaps

When we were a team of 25 engineers, I felt like I had a good sense of our incident culture. I was in most incident channels, attended most postmortems, talked to people regularly. It felt healthy.

Now we’re 80+ engineers across multiple teams. And I’m noticing something: some teams are incredibly open in their postmortems - sharing detailed insights, acknowledging mistakes, proposing ambitious improvements. Other teams produce postmortems that feel sanitized, surface-level, defensive.

Same company. Same policies. Same postmortem template. Different cultures.

The Measurement Question

My instinct is: if I can’t measure it, I can’t improve it. So I started thinking about metrics for psychological safety during incidents:

Potential Metrics I’ve Considered:

  1. Participation rates in postmortem discussions

    • How many people contribute to the discussion?
    • Are ICs speaking up or just managers?
    • Theory: Higher participation = more safety
  2. Number of contributing factors identified

    • Are we finding 2-3 factors or 10-15?
    • Theory: More factors = deeper system thinking, less fear
  3. Anonymous feedback scores

    • Post-incident survey: “I felt safe sharing my perspective” (1-5)
    • “I believe we’ll learn from this incident” (1-5)
    • Theory: Direct measurement of safety perception
  4. Retention rates of incident participants

    • Do engineers who lead incident response stay or leave?
    • Theory: If people leave after incidents, safety is low
  5. System improvements per incident

    • How many actual system changes result from postmortems?
    • Theory: Real improvements indicate the process is valuable, not just theater
  6. Time to postmortem completion

    • How quickly do teams complete postmortems?
    • Theory: Faster = more eager to learn (vs dragging feet to avoid shame)

The Fundamental Tension

But here’s what I’m struggling with: the act of measuring psychological safety might undermine the safety you’re trying to create.

If engineers know their “participation rate” is being tracked, does that change how they participate? Does it create performance anxiety around vulnerability?

If we survey people about safety, does that make them more conscious of being evaluated, which decreases safety?

There’s a observer effect here. Measurement changes the thing being measured, especially when it comes to psychological dynamics.

What I Actually Care About

Stepping back, what I really want to know is:

  • Are people being honest in postmortems?
  • Are we learning effectively from incidents?
  • Do engineers feel supported when they’re involved in incidents?
  • Is incident culture strengthening our teams or damaging them?

Maybe those questions aren’t directly measurable. Maybe they require qualitative assessment, not metrics.

What We’re Doing Now (Imperfectly)

Currently, I’m trying a few approaches:

  1. Manager Check-ins: I ask my directs (engineering managers) to do 1-on-1s with anyone involved in major incidents. Not about the incident itself, but about how the process felt.

  2. Postmortem Review: I randomly sample postmortems from different teams and look for patterns. Are they rich with insights or sparse? Do they show curiosity or defensiveness?

  3. Exit Interview Analysis: When engineers leave, we specifically ask about incident culture. If multiple people mention it, that’s a signal.

  4. Informal Observation: I still try to be in incident channels and postmortem meetings. Not to evaluate, but to feel the temperature of the culture.

None of these are proper metrics. They’re more like… qualitative sensing? Which feels unsatisfying to my data-driven brain, but might be more appropriate for this kind of human dynamic.

Questions for the Community

I’m curious how others approach this:

  • Do you measure psychological safety in incident response? How?
  • Have you found metrics that actually work without creating perverse incentives?
  • Is measurement the wrong approach entirely?
  • How do you know if your incident culture is healthy as you scale?

I want to get better at this. Our engineering culture is one of our competitive advantages, and incident response is where culture shows up most visibly. But I’m not sure if measuring it is the right move or if it’s something that requires a different kind of attention.

What’s worked for you?

Keisha, I’m firmly in the “you should measure it” camp, but I agree the measurement approach needs to be carefully designed.

Why Measurement Matters

In my experience, what gets measured gets attention from leadership. If psychological safety isn’t measurable, it becomes one of those “soft” things that gets deprioritized when the company is under pressure. Having data makes it concrete and actionable.

At my current company (Fortune 500 financial services), we’ve been measuring incident culture for about 18 months. It hasn’t been perfect, but it’s given us valuable insights and, importantly, kept the conversation alive at the executive level.

Our Approach: Quarterly Pulse Surveys

We run incident-specific questions in our quarterly engineering pulse survey. The key is: we don’t measure individual incidents. We measure overall perception of incident culture.

Questions we ask (1-5 scale):

  • “I feel comfortable sharing mistakes that led to incidents”
  • “My team learns effectively from incidents”
  • “Being involved in an incident does not negatively impact how I’m perceived”
  • “Our postmortem process helps improve our systems”
  • “Leadership responds constructively when incidents occur”

We also have two open-ended questions:

  • “Describe a time when our incident process worked well”
  • “What would make you feel safer during incident response?”

Why This Works (Mostly)

The survey is:

  • Anonymous: No one knows who said what
  • Aggregate: Results are only shown at team level (minimum 5 respondents)
  • Trend-focused: We care about direction, not absolute scores
  • Quarterly: Not constant surveillance, just periodic check-ins
  • Action-oriented: We commit to addressing the top 2-3 themes each quarter

What We’ve Learned

Some concrete insights from 18 months of data:

  1. New team members perceive incidents differently: Engineers with <6 months tenure consistently rate psychological safety 0.5-1.0 points lower. This led us to improve our incident onboarding.

  2. Distributed teams need different support: Our remote-first teams scored lower on “I feel comfortable sharing mistakes.” We realized async postmortems lose the human connection. Now we do video debriefs.

  3. Manager behavior is everything: Teams with managers who scored high on “responds constructively” had 2x better psychological safety scores. This became a manager training focus.

  4. Action follow-through matters: When we tracked action item completion from postmortems, teams with >80% completion rated incident process 1.5 points higher. Engineers lose trust when postmortems don’t lead to change.

The Observer Effect You Mentioned

You’re right that measurement can change behavior. But I’d argue: we WANT it to change behavior. Making psychological safety measurable signals it matters.

The key is measuring outcomes (“do people feel safe?”) not behaviors (“participation rate”). Measuring participation creates performance pressure. Measuring perception creates awareness.

Complementary Qualitative Measures

We pair quantitative surveys with qualitative approaches:

  • Skip-level 1-on-1s: I do skip-levels with ICs and always ask about incident culture
  • Postmortem sampling: Like you, I review random postmortems for depth and honesty
  • Incident retrospectives: Quarterly, we review our incident process itself with the team

The combination of quantitative + qualitative gives a fuller picture.

Where Measurement Falls Short

You asked if measurement is the wrong approach. I think it’s incomplete, not wrong.

Metrics tell you IF there’s a problem and WHERE it might be (which team, which time period). But they don’t tell you WHY or HOW to fix it. That requires conversations, observation, and cultural work.

Also, you can’t measure everything. Trust, vulnerability, authentic learning - these are emergent properties of culture, not directly quantifiable. But you can measure proxies and indicators.

Making Measurement Actionable

The key is: what do you do with the data?

We share results transparently with the entire engineering org. Every quarter, I present:

  • Our current scores
  • How they’ve changed
  • What themes emerged from open-ended responses
  • What actions we’re taking

This transparency signals: we’re paying attention, we care, we’re acting. That builds trust in the process.

Recommendation

Start with quarterly pulse surveys (not per-incident). Keep it simple - 5-7 questions maximum. Focus on perception and outcomes. Share results transparently. Commit to acting on what you learn.

It’s not perfect, but it’s better than trying to “feel” the culture across 80+ engineers. And having data gives you leverage to invest in culture improvement when leadership asks “why does this matter?”

This is a really interesting discussion, and as a product person, I want to add a perspective that might be uncomfortable: psychological safety in incident response shouldn’t be just an engineering concern.

The Cross-Functional Gap

In every company I’ve worked at - Google, Airbnb, now at a fintech startup - there’s this invisible wall around incidents. Engineering owns them. Product, design, and other teams hear about them after the fact, if at all.

But here’s the reality: we’re all affected by incidents, and we all have valuable context.

A Recent Example

Three weeks ago, we had a payment processing outage that lasted 90 minutes. Engineering handled it well technically - found the issue, fixed it, wrote a postmortem.

But the postmortem missed something crucial: this was the third payment incident in two months. Engineering saw three unrelated issues (different root causes, different systems). But from a customer perspective, it was “your payment system is unreliable.”

Product team has context that engineering doesn’t:

  • Customer feedback and complaints
  • Competitive landscape (our competitors DON’T have payment issues)
  • Business impact beyond immediate revenue loss
  • User trust erosion over time

We’re Creating an “Us vs Them” Dynamic

When incidents are engineering-only territory, it creates problematic dynamics:

  • Product feels excluded: We’re responsible for customer experience, but we’re not included in understanding system failures
  • Engineering feels blamed: When product asks questions about incidents, it can feel like finger-pointing
  • Learning is siloed: Engineering learns about systems, but doesn’t learn about customer impact. Product learns about customer impact, but doesn’t understand technical constraints.

What Changed My Mind

I used to think incidents were purely technical. Then our CTO invited me to a major incident postmortem last year. It was eye-opening.

I learned:

  • Why certain features are architecturally risky (helps me prioritize differently)
  • What technical debt is actually costing us (makes the case for refactoring)
  • How much effort goes into reliability (appreciate engineering work more)

And engineering learned from me:

  • Which customer segments were most affected (helped prioritize monitoring)
  • What customers were trying to do when things broke (revealed usage patterns)
  • Business context for why this incident mattered more than others (aligned priorities)

Psychological Safety Extends Beyond Engineering

Keisha, to your original question about measurement - I’d argue psychological safety in incidents needs to extend to non-engineering teams.

When product people feel safe to ask questions during incidents without seeming like we’re blaming engineering, the learning gets better for everyone.

When design can contribute to postmortems (“this error state is confusing, that’s why users kept retrying”), we build better products.

Practical Suggestion

Include product in major incident postmortems (maybe P0/P1 only). Not as observers, but as participants. Create explicit space for cross-functional perspectives:

  • Engineering shares technical details
  • Product shares customer impact and context
  • Design shares UX observations
  • Customer success shares support patterns

Surprising Benefit: Engineers Feel Less Blamed

Here’s what I didn’t expect: when product is in the postmortem, engineers actually seem less defensive.

Why? Because the conversation shifts from “engineering broke something” to “we all learned something about our system and our customers.” The shared ownership of learning reduces the implicit blame.

Measurement Idea

If you’re measuring psychological safety, include cross-functional questions:

  • “I feel comfortable attending incident postmortems as a non-engineer”
  • “Cross-functional perspectives improve our incident learning”
  • “Product and engineering collaborate well during incidents”

This would surface whether psychological safety exists across the organization, not just within engineering.

The Strategic Value

From a product strategy perspective, incidents are incredibly valuable learning moments. They reveal what customers actually do, what features actually matter, and where our systems don’t match our assumptions.

When we make incidents a shared learning experience across functions, we turn every outage into competitive advantage. We learn faster than companies that silo incident response in engineering.

Keisha, I think your instinct to measure is right. But expand the scope beyond engineering psychological safety to cross-functional psychological safety. The best incident cultures include everyone.

I’m going to take a slightly different angle here and share something we built that indirectly measures psychological safety through system behavior rather than surveys.

The Challenge with Surveys

I worked at Google Cloud AI before my current startup, and we tried various survey approaches to measure incident culture. They had value, but they also had problems:

  • Response rates dropped over time (survey fatigue)
  • Engineers gamed positive responses (“look, we’re doing well!”)
  • Timing mattered too much (post-incident emotions vs reflection)
  • Hard to separate incident culture from general engineering culture

An Alternative: Passive Measurement Through Behavior

Instead of asking people how they feel, we started tracking observable behaviors that correlate with psychological safety. I built an “incident insights dashboard” that tracks:

1. Time to First Postmortem Draft

  • How quickly does the team start the postmortem after incident resolution?
  • Theory: Teams eager to learn start fast. Teams avoiding shame drag their feet.
  • Data: Our psychologically safe teams draft postmortems within 24 hours. Less safe teams take 5-7 days.

2. Postmortem Edit Activity

  • How many people contribute edits to the postmortem doc?
  • How much iteration happens before it’s “final”?
  • Theory: More collaboration = more psychological safety
  • Data: Safe teams have 5-8 contributors. Less safe teams have 1-2 (usually the person who caused the incident).

3. Action Item Patterns

  • What types of action items emerge from postmortems?
  • Theory: Safe teams focus on system improvements. Less safe teams focus on process compliance.
  • We categorize action items:
    • System improvements (“add monitoring for X”)
    • Process changes (“require approval for Y”)
    • Individual training (“person should learn Z”)
  • Data: Safe teams: 70% system, 25% process, 5% individual. Less safe teams: inverse.

4. Contributing Factors Count

  • How many contributing factors does each postmortem identify?
  • Theory: Safe teams do deeper analysis, identify more systemic factors
  • Data: Safe teams average 8-12 factors. Less safe teams average 2-4.

5. Observability Improvements Per Quarter

  • How many monitoring/alerting/logging improvements result from incidents?
  • Theory: Learning culture turns incidents into system improvements
  • Data: Our best teams ship 10-15 observability improvements per quarter driven by incidents.

6. Repeat Incident Rate

  • Do similar incidents keep happening?
  • Theory: If learning is real, repeat incidents should decrease
  • Data: Safe teams: repeat rate <5%. Less safe teams: 15-20%.

Why This Approach Works

These metrics are:

  • Passive: No surveys, no self-reporting, just observable behavior
  • Objective: Based on artifacts (docs, commits, tickets), not perception
  • Continuous: Generated from every incident, not quarterly
  • Actionable: Easy to see which teams need support

The Dashboard

I built a simple dashboard that shows these metrics per team. Engineering leadership can see:

  • Which teams are learning effectively from incidents
  • Which teams might have psychological safety issues
  • Trends over time (are we improving?)

The dashboard doesn’t call out “bad” teams. It highlights “high learning” teams and makes their practices visible to others.

Example Insight

Six months ago, we noticed one team consistently had:

  • Low edit activity on postmortems
  • High individual training action items
  • Long time to draft

This signaled a potential culture issue. The engineering manager did some investigation and found: the team had a new senior engineer who was very critical in code reviews and postmortem discussions. People felt judged, so they minimized their participation.

Manager addressed it (moved that engineer to a different team, coached them on feedback style). Within two months, the team’s metrics improved significantly.

Without the dashboard, we might not have noticed the pattern until someone left and mentioned it in an exit interview.

Limitations

This approach doesn’t capture everything:

  • Can’t measure feelings directly
  • Some teams might game the metrics
  • Correlation doesn’t prove causation
  • Misses qualitative nuance

But combined with occasional surveys and manager conversations, it gives a pretty good picture of incident culture health.

Tools We Use

The dashboard pulls data from:

  • Confluence (postmortem docs, edit history)
  • Jira (action items, categorized with labels)
  • GitHub (commits tagged with incident numbers)
  • Our incident tracking system (timing, severity, repeat incidents)

It’s not perfect, but it’s way better than trying to sense culture through periodic check-ins.

Recommendation

Keisha, if you want to measure psychological safety at scale, consider combining:

  • Quarterly surveys (Luis’s approach) for perception
  • Passive behavioral metrics (my approach) for observable patterns
  • Qualitative assessment (your current approach) for context

Together, they give a robust picture without over-surveying or creating measurement anxiety.

Reading all these measurement approaches is interesting, but honestly? I think the simplest answer might be the best.

Just Ask People

After going through that incident I mentioned in Luis’s thread, my manager did something that helped more than any survey or dashboard would have: she asked me how I was doing.

Not in a performance review. Not in a structured postmortem. Just a casual 1-on-1 where she said: “Hey, I know that incident was tough. How are you feeling about it? What would have made the process better?”

That conversation helped me process the experience. And her genuine interest in my well-being (not the incident details, not the lessons learned, but how I was doing) built trust.

Why Measurement Might Miss The Point

Here’s my concern with all this measurement: psychological safety isn’t about having the right metrics. It’s about having the right relationships.

I don’t feel safe because there’s a survey that says the culture is good. I feel safe because:

  • My manager has my back
  • My teammates don’t judge me for mistakes
  • Leadership genuinely cares about people, not just systems
  • I’ve seen others make mistakes and be supported, not punished

Those things come from repeated interactions over time. Not from metrics.

The Risk of Measurement Theater

I’ve seen companies optimize for metrics while completely missing the human reality. Like:

“Look, our postmortem participation rate is high!” → Yeah, because people feel pressured to contribute, not because they feel safe.

“Our action item completion rate is 90%!” → Yeah, because people check boxes, not because we’re actually learning.

“Survey says psychological safety is 4.2/5!” → Yeah, because people know their manager sees aggregated results and they don’t want to be the “problem team.”

You can game metrics. You can’t game genuine human connection.

What Actually Helps

In my experience, what creates psychological safety is:

  1. Consistent, small gestures over time: Manager checking in after tough weeks. Senior engineers sharing their own mistakes. Leadership thanking people for honest postmortems.

  2. Seeing how incidents affect careers (or don’t): When someone causes an incident and then gets promoted anyway. When leaders explicitly say “this incident doesn’t change how we see you.”

  3. Normalizing failure through repetition: When everyone has been incident commander, everyone has been paged, everyone has caused a problem - it stops being a stigma.

  4. Leaders going first with vulnerability: When the VP shares their incident stories. When the CTO says “I don’t know” in a postmortem. When managers admit mistakes.

None of those are measurable. They’re cultural practices that accumulate into trust.

Simple Approach

If I were a VP trying to understand incident culture across 80 engineers, I’d do this:

  • Have managers check in with anyone involved in a significant incident, within a few days
  • Simple questions: “How did that feel?” “What would have helped?” “Are you okay?”
  • Aggregate themes informally across those conversations
  • Share what you’re hearing with leadership
  • Act on patterns

No dashboards. No surveys. Just human conversations.

When Metrics Might Help

That said, I can see value in Luis’s quarterly surveys and Alex’s passive metrics as signals that point you to where conversations are needed.

If a team’s metrics look off, that’s a flag to go talk to them. The metrics don’t tell you what’s wrong, but they tell you where to look.

But the actual work of building psychological safety? That’s relational, not analytical.

Trust Is Built In Conversations

Keisha, you said you’re doing manager check-ins, postmortem review, and informal observation. Honestly, that sounds right to me.

Maybe the question isn’t “how do I measure this better?” but “how do I scale these human practices as we grow?”

Can you train managers to do better check-ins? Can you create space for more informal observation? Can you build more opportunities for leadership to show vulnerability?

Those feel harder than implementing a survey, but they might be more effective.

Final Thought

I appreciate everyone’s thoughtful approaches here. I’m not anti-measurement. I just think measurement works best as a supplement to relational practices, not a replacement for them.

And for what it’s worth, the fact that you’re asking this question at all, Keisha - that you care enough to want to measure and improve psychological safety - that’s probably the most important signal of all.