Controversial take: Production incidents are some of the most valuable learning opportunities for product teams - not just engineering teams.
I know that sounds counterintuitive. Incidents are expensive, stressful, and harm customers. We should definitely try to prevent them.
But when they DO happen? They’re gold mines of product insight that we often ignore.
Context: VP of Product Perspective
I’m VP of Product at a Series B fintech startup. I’ve been in product for 12+ years at Google, Airbnb, and now here. And one pattern I’ve noticed across every company:
Engineering treats incidents as engineering problems. Product barely pays attention. We’re missing massive learning opportunities.
The Incident That Changed My Perspective
Six months ago, we had a payment processing incident. Our payment service was down for 90 minutes during a weekday afternoon. Classic infrastructure problem, right?
Engineering wrote a great postmortem: database connection pool exhaustion, root cause identified, preventive measures implemented. Case closed.
But I dug deeper and found something fascinating:
The incident revealed that customers were using our product in ways we never intended.
What caused the connection pool exhaustion? A spike in a specific API endpoint that we thought was rarely used. Turns out, one of our enterprise customers had built an integration that was hitting that endpoint hundreds of times per minute.
We had no idea customers needed this level of API usage. It never came up in user research. But the incident revealed it.
That integration became a product feature. We built proper support for high-frequency API access, turned it into a paid tier, and it’s now generating $200K+ ARR.
What Incidents Reveal About Product
Since that realization, I’ve started reviewing every major incident through a product lens. Here’s what incidents teach us:
1. Real Usage Patterns (Not Assumed Ones)
User research tells us what customers THINK they do. Incidents reveal what customers ACTUALLY do.
Examples from our incidents:
- Incident showed customers exporting 50K+ row reports (we designed for 1K)
- Incident revealed customers using our app at 2am (we assumed business hours only)
- Incident exposed that customers were screen-scraping our web app because we lacked a proper API
Each of these became product opportunities.
2. Which Features Actually Matter
When a service goes down, you see what customers complain about. And often, it’s surprising.
We had an incident that took down our analytics dashboard (internal tool). I expected minimal customer impact. Wrong. Support got flooded with complaints.
Turns out, customers used this “internal” dashboard daily for reporting to their executives. It wasn’t a nice-to-have, it was critical to their workflow.
We completely reprioritized analytics work based on this learning.
3. System Architecture Constraints on Product
Incidents reveal where our technical architecture limits product possibilities.
Example: We had repeated incidents related to a monolithic service that powered multiple features. During incidents, unrelated features would fail together.
This taught product: we need to architect for feature independence. It influenced our multi-year platform strategy.
4. Gaps in Product Documentation
Many incidents are caused by customers misunderstanding how to use the product.
When a customer misconfigures something and causes an incident, that’s not their fault - it’s our documentation and UX failing them.
We’ve improved error prevention, clearer docs, and better onboarding based on incident patterns.
5. Integration and Partner Dependencies
Incidents expose which third-party integrations customers actually rely on (vs which ones are optional).
We learned that our Salesforce integration was far more critical than our HubSpot integration based on customer impact during respective outages.
This informed partnership prioritization and reliability investment.
Cross-Functional Incident Reviews
After this realization, I changed how we handle major incidents:
Engineering postmortem (technical):
- What broke, why, how to prevent recurrence
- System improvements, monitoring, architecture
Product review (business):
- What does this tell us about customer behavior?
- What product gaps did this reveal?
- What opportunities emerged?
- Should this change our roadmap?
Both are valuable. Both should happen.
Example: API Rate Limiting Incident
Engineering perspective:
- We hit rate limits on a third-party API
- Root cause: Didn’t anticipate usage growth
- Fix: Implement caching, negotiate higher limits
- Prevent: Better usage monitoring
Product perspective:
- Customers are using this feature 10x more than we predicted
- This feature is more valuable than we thought
- We should build more capabilities around it
- Opportunity: Premium tier with higher limits
Same incident. Different, complementary learnings.
Making Incidents a Cross-Functional Learning Moment
Here’s what we changed:
1. Product attends major incident postmortems (P0/P1)
- Not as observers, but as active participants
- We ask: “What does this tell us about customers?”
2. Monthly incident themes review
- Engineering + Product + Design + Customer Success
- Look for patterns across incidents
- Ask: What should we build/change/improve?
3. Incident insights feed into roadmap planning
- Quarterly, we review incident-driven insights
- Some of our best features came from incident learnings
4. Customer success joins postmortems
- They have context on customer reactions
- They understand business impact beyond metrics
Real Examples of Product Changes from Incidents
Let me share concrete examples:
Incident: Database slowdown during data exports
→ Product learned: Customers export way more data than we designed for
→ Action: Built async export system, made it a product feature
→ Result: Enterprise customers love it, became a differentiator
Incident: Auth service timeout during login spike
→ Product learned: Customers have “all hands” meetings where everyone logs in simultaneously
→ Action: Improved session management, added “remember me” feature
→ Result: Better user experience, fewer support tickets
Incident: Search service overload
→ Product learned: Customers use search as their primary navigation (not the menu we designed)
→ Action: Redesigned navigation around search-first paradigm
→ Result: User engagement up 15%
Incident: API quota exceeded
→ Product learned: Customers want programmatic access way more than we thought
→ Action: Built proper API product with tiered pricing
→ Result: New revenue stream, $500K+ ARR
The Mental Shift
The key shift is reframing incidents:
From: “Something broke, let’s fix it”
To: “We learned something unexpected about our customers and systems”
Every incident is expensive user research. You’re paying for it (in downtime, customer impact, engineering time). You might as well extract the product insights.
The Surprising Cultural Benefit
One thing I didn’t expect: when product participates in incident response, engineering feels less blamed.
Why? Because it shifts the conversation from “engineering broke something” to “we all learned something about our product and customers.”
The shared ownership of learning reduces the implicit blame. Product saying “this incident helped us understand our customers better” reframes it from failure to insight.
Questions for Product Leaders
- Do you review incident postmortems?
- Do you know what the last 5 incidents revealed about customer behavior?
- Are incident insights feeding into your roadmap?
- Is customer success involved in incident reviews?
If not, you’re leaving product intelligence on the table.
Questions for Engineering Leaders
- Do you invite product to incident postmortems?
- Do you frame incidents as learning opportunities beyond technical fixes?
- Do you track business/product insights from incidents?
Cross-functional incident learning benefits everyone.
The Competitive Advantage
Here’s the strategic angle: our competitors have incidents too. But they treat them as purely engineering problems.
We treat them as organizational learning opportunities. We extract product insights, customer behavior patterns, and business intelligence.
Over time, this compounds. We understand our customers better. We build better products. We make smarter roadmap decisions.
Incidents aren’t just problems to solve. They’re data to learn from.
Call to Action
If you’re a product leader: attend your next major incident postmortem. Don’t just skim the doc later - actually participate in the discussion.
Ask:
- What does this tell us about our customers?
- What assumptions did this challenge?
- What opportunities does this reveal?
- Should this change our roadmap?
I guarantee you’ll learn something valuable.
And if you’re an engineering leader: invite product to your next postmortem. Frame it as “help us understand the business context and customer impact.”
The best product companies learn from every signal. Incidents are loud, expensive signals. Let’s not waste them.
What do you think? Do other product teams do this? Am I missing something?