Prod goes down. Your company loses some money. That’s sad, boo hoo, etc.
If your company does post mortems well, then that outage is a gift. It’s a big fat red arrow pointing at a something that needs hardening. Maybe it’s a process, or an architecture, or some infrastructure.
The outage wasn’t money your company lost. It’s money your company spent on acquiring knowledge. Think of it like paying that money to a third party who investigates and points out weak points. Seems pretty valuable, yeah? I agree.
And finally, always remember to start each post mortem with The Prime Directive:
Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.Norm Kerth