Published: Last updated:

Post-Mortem

A Post-Mortem is a written investigation of an Incident that documents what happened, why it happened, how the team responded — and, most importantly, what measures are being taken to prevent recurrence.

The fundamental prerequisite is a Blameless Culture: we look for systemic weaknesses, not guilty individuals (see Blameless Culture).

Anti-Patterns: The Missed Opportunity

  • Blame: "Who typed that command?" only ensures that people hide mistakes in the future.
  • No Action-Items: The failure gets discussed, but nothing changes in the processes or the technology. The same failure happens again three months later.
  • Secrecy: The analysis stays within a small group instead of letting the entire organisation learn from the Incident.

The Learning Loop

  1. Define the trigger: From what severity level (e.g. data loss or >1h outage) is a formal Post-Mortem mandatory?
  2. Timeline reconstruction: Objective capture of all events based on logs and chat histories.
  3. Root-Cause Analysis (The 5 Whys): Why did the failure occur? Why did monitoring not alert? Why was there no automated backup?
  4. Prioritise Action-Items: Every root cause must lead to a concrete task: a code change, an extended test, or a process adjustment.
  5. Transparent publication: The report is published for all developers in the organisation (e.g. in Neuland or the Wiki).

The Focus: Prevention Over Punishment

A Post-Mortem is a gift to the future of the organisation. It is the only way to genuinely improve the reliability of complex systems over time.

FAQ

Don't we waste too much time writing these reports?

A major outage prevented in the future saves thousands of hours. A Post-Mortem is the cheapest training you can give your team and your systems.

Should you feel ashamed if your name appears in the Timeline?

Absolutely not. In a Blameless Culture the principle holds: if a human could make a fatal mistake, that is a design flaw in the system that failed to prevent it. We thank you for your honesty during the analysis.

Reference Guide

  • Google SRE Book — Postmortem Culture: The industry standard. sre.google
  • The Field Guide to Understanding 'Human Error': Sidney Dekker on systemic thinking. CRC Press
  • Postmortems.io: A collection of public Post-Mortems from well-known tech companies. postmortems.io

Related Topics

Open Items