Write a postmortem that someone outside your team will actually read
The problem is not the template
The format is not the problem. Most engineering teams know how to structure a postmortem: executive summary, timeline, root cause, action items. Many have a shared template refined through a dozen incidents. The doc gets written within 48 hours, filed, linked in the incident channel. And then nobody outside the team reads it.
That is a writing failure, not a process failure. The postmortem was written for the people who were already in the room. For anyone who was not there — an engineer two time zones away, an executive who needs to understand the scope, a new hire nine months from now — the document is often opaque at best and misleading at worst. What follows are the specific writing choices that fix this.
A postmortem nobody reads is just a ritual
When a postmortem is only read by the team that wrote it, it is functioning as a closure ritual, not as institutional memory. The meeting happens, the doc gets filed, the action items land in Jira, and the organisation learns nothing that travels.
The test for a good postmortem is whether someone who was not in the incident — another engineer, a team lead, a founder — can read it cold and come away with a clear picture of what broke, why the system allowed it, and what changed. Most postmortems fail this test because every sentence assumes the reader already has the context the author has.
The fix is not more detail. It is different detail — the kind organised around what the reader needs to understand, not what the author experienced in sequence.
“A postmortem is a communication artifact. Distribution is as important as the writing.”
Start with the sentence nobody writes
Open with a single paragraph that stands alone. If someone reads only this paragraph, they should know: what failed, for how long, how many users or systems were affected, and what the single most important change is afterward.
Here is what most postmortems actually open with:
At 14:32 UTC, an engineer on the infrastructure team noticed elevated error rates in the payment processing service and paged the on-call. Initial triage focused on the database layer...
Here is what they should open with:
Our payment service was unavailable for 47 minutes on March 14, affecting roughly 1,200 active sessions and causing an estimated 14,000 euros in failed transactions. The root cause was a migration script that did not account for a lock acquisition behaviour in Postgres 16. We fixed the immediate issue and added a migration dry-run step to our release checklist.
The first version is chronological narration. The second is a briefing. The first requires shared context to parse. The second gives you the context. This paragraph takes five minutes to write and most teams never write it, because it requires knowing the conclusion before you have written the rest of the document. Write the rest first, then come back and write this.
One more thing this paragraph does: it defines the scope. Executives and other teams often read only this paragraph. If it is absent or buried, the people who most need a summary go without one.
The timeline belongs after the root cause, not before it
Engineering teams reach for timelines because they are accurate: this happened, then this, then this. A timeline is a faithful record. The problem is that readers follow a timeline passively, accumulating events without forming a model of what actually went wrong. By the time they reach the root cause section, they are carrying a pile of disconnected facts with no frame for organising them.
Structure the document around explanation, not chronology. The narrative section should answer 'why was this possible' before it answers 'what happened when.' Put the minute-by-minute timeline after the root cause or in an appendix. Make it available for anyone who wants to audit the sequence, but do not use it as the skeleton of the document.
The reader needs to understand the shape of the failure before they are handed the play-by-play. Once they have that model, the timeline is useful context. Without it, the timeline is just a long list of things that happened.
Root cause is a systems question, not a blame trail
The blameless postmortem concept is well understood in theory and badly implemented in practice. Most teams know they should not write 'the engineer deleted the wrong file.' What they do not know is what to write instead.
| Version | What it says | What it teaches |
|---|---|---|
| Blaming | An engineer deleted the wrong environment variable during a manual deployment. | Nothing. The next engineer makes the same mistake. |
| Adequate | A manual deployment step lacked validation for required environment variables. | There is a missing check. Someone will add it. |
| Good | Our deployment process treated staging and production as environment-symmetric, but they are not — production has three variables staging lacks. This was undocumented and unvalidated at deploy time. The latent failure predated this incident by months. | Everyone who touches deployments now understands a structural truth about the system that was true before the incident happened. |
The 'good' version explains what was structurally true about the system before the incident happened. That is what creates organisational learning: not knowing what went wrong in sequence, but knowing what was already broken before the event that triggered it.
After you have written your root cause section, ask this: could an engineer who joins the team six months from now read this and understand what to watch for? If not, the root cause is still written for the people who were in the room.
Five action items that close, not fifteen that accrue
Pick five or fewer action items. Each one needs a concrete action in a single sentence, one named owner, and a deadline. 'Improve monitoring' fails all three. 'Add an alert on payment service error rate above 0.5% for more than 90 seconds, owned by [Name], done by [Date]' passes all three.
If an action item cannot fit that format, it is not an action item yet. It is a concern. Write concerns in a separate section called 'open questions' and let someone own converting them into actual tickets at a later point. Vague action items are worse than no action items, because they create the appearance of follow-through without the reality.
Before you publish, go through each action item and ask whether it could plausibly appear closed in a sprint review three weeks from now. If you cannot picture that, cut it or sharpen it.
Getting the doc read after you publish it
Publishing a postmortem is not the same as distributing it. A link dropped in the incident Slack channel at 2 AM will be read by the people who were awake and already know the story. The people who most need to learn from it are usually not in that group.
Post with a three-sentence summary
Any time you share the link, write three sentences alongside it: what failed, who was affected, the single most important change. Most people decide whether to open the full document based on that message. The doc carries the depth; the summary carries the reach.
Include it in engineering updates
A well-written postmortem from three weeks ago is still worth reading. Include it in your weekly engineering update or monthly newsletter, with one sentence of context for why it is worth eight minutes. Systemic lessons hold longer than you might expect.
Surface it during onboarding
New engineers should read postmortems about systems they will be working on. A postmortem from eighteen months ago about a database migration failure teaches more about a system's real behaviour under pressure than any runbook does. Keep a short list of the postmortems that are still accurate and still instructive — this list is valuable enough to maintain.
Cite it when the same failure class reappears
The best time to reference a postmortem is when a similar failure is about to recur. Saying 'this looks like the same trigger as the March incident' and linking the doc is more persuasive than a general warning. It also demonstrates that the organisation retained something from writing the postmortem in the first place.
Organisations that write postmortems well do not just recover faster from incidents. They accumulate a specific kind of knowledge: the kind that lives in written sentences, not in the heads of the people who happened to be on-call that night. The template is the easy part. Writing for someone who was not in the room is the work.
Frequently asked questions
Related reading
Which HR documents actually need Aadhaar eSign — and which don’t
Most HR documents are legally valid with a basic electronic signature and audit trail. This decision framework maps which documents need Aadhaar eSign, which don’t, and how to build a tiered workflow that covers both.