How long should a postmortem be?

Length is not a useful target. The executive brief should be 150-200 words. The forensic timeline and root-cause section should be as long as it needs to be to give the incident team a complete record. A 1,200-word postmortem with a clear structure is more useful than a 500-line one without one.

Who should write the postmortem?

The engineer closest to the incident should not write it alone — they have too much context and cannot see where the document assumes knowledge the reader does not have. A second engineer who was present but not leading the response, or a manager, should review the executive brief specifically and flag every sentence that requires prior knowledge to understand.

When should the postmortem be published?

Within five working days of the incident being resolved, while the forensic detail is still accessible from memory and logs. Action items should be created in the team's work tracker on the same day as publication — not after a review cycle.

What is the difference between the root cause and the underlying cause?

The root cause is the proximate technical fact — replication lag, a config change, a missing index. The underlying cause is the decision or system that allowed the proximate cause to reach production — a release process that did not validate under campaign-level traffic, an alerting gap that had been open for three months. Postmortems that identify only the proximate cause are the ones where the same class of incident recurs.

Best PracticesMay 16, 20265 min readReviewed May 16, 2026

Write a postmortem that someone outside your team will actually read

Most postmortems fail their readers not because the analysis is wrong, but because the document assumes one audience when it has three.

By FlowVerify Editorial Team

A 500-line Confluence page. Seven bullet points under Root Cause Analysis. Eleven action items, eight of them assigned to Team. One month later, the page has four emoji reactions and the monitoring alert that triggered the incident still fires every Tuesday at 3 AM.

This is the default postmortem. It documents everything and changes nothing.

The failure is not in the analysis. Most teams that run structured incident reviews understand the technical chain of events by the time the postmortem is written. The failure is in the document design.

The forensic format is written for the people who were in the room

Most postmortem templates evolved from SRE practice, where the goal is institutional memory: every step of the incident recorded so the on-call team can reconstruct what happened if it recurs. That is a legitimate goal. But that goal produces a document that is nearly impossible to read if you were not already part of the incident.

The engineers who built the system know what 'the replication lag spiked at 14:32' means. Their manager does not. The on-call engineer who joins six months from now does not. The customer success manager trying to explain the outage to a frustrated enterprise customer definitely does not.

Most postmortem guides do not acknowledge this. They assume one reader, and it is the reader who was already there.

You have three readers, not one

Before writing the first sentence, name who will actually read the document.

The incident team: the people who were on the call or in the Slack thread. They need the forensic detail: exact timestamps, specific queries, the alerting gap. They are also the most likely to read the whole document.

Leadership and stakeholders: your VP of Engineering, your CEO if the outage was significant, the customer success team. They need to know what happened to users, for how long, and why the situation is now different. Not 14:32 and replication lag. Customer impact and remediation.

The future engineer: the person joining in eight months who hits the same class of problem. This is the reader most postmortem templates neglect. Their question is: what decision was made, and why, that led here?

These three readers want different things. A single 500-line document serves none of them well.

The two-document approach, in one document

The solution is not to write three separate documents. It is to structure one document so each reader can find what they need quickly and stop reading when they have it.

Put the executive brief at the top. Write it last. Cap it at 200 words. This is the only section leadership needs to read. Below the brief: the forensic record, with the full technical timeline, root cause, and contributing factors. At the bottom, not scattered throughout: action items, each with an owner and a due date.

The structural separation does two things. It tells each audience where to look. And it forces the writer to think about customer impact separately from technical explanation — which produces better prose in both sections.

Rewriting the postmortem: before and after

The prose problem in most postmortems is chronological drift: the document follows the sequence of the incident rather than the sequence of understanding.

Section	Forensic version (written for the incident team)	Readable version (written for leadership and the future engineer)
Opening	At 14:31 UTC, the primary database experienced elevated replication lag, triggering an alert at 14:34 which was acknowledged by the on-call engineer.	For 47 minutes on 12 May, approximately 3,200 users received errors when placing orders.
Root cause	TRANSACTION_ISOLATION was set to READ_UNCOMMITTED in the replica config, causing dirty reads under concurrent write load.	A database configuration change from the previous week behaved correctly under normal load but failed under the write pattern produced by a marketing campaign.
Action item	Improve monitoring for database issues.	Add alert for write queue depth > 10,000 on the order-creation topic, paged to on-call within 2 minutes — @sre-team, due 30 May.

The same postmortem section, written two ways

The single most useful rewrite: lead with customer impact, not the technical event. 'Our database had elevated replication lag' is the technical event. 'Order placement was unavailable for 3,200 users for 47 minutes' is the customer impact. Start with the second. Explain the first below it.

The forensic detail is still in the document. It lives in the timeline section, where the incident team will find it. It does not need to appear in the opening sentences, which are the ones every reader actually reads.

Action items: where postmortems go to die

Action items that live only in the postmortem document do not get done. The fix is mechanical: create the Jira, Linear, or GitHub issues the day the postmortem is published, and link to them from the document. The postmortem is the narrative record; the tracker is where work lives. Keep them connected.

Three other constraints that matter:

Be specific. 'Improve monitoring' is not an action item. 'Add an alert for write queue depth above 10,000 messages on the order-creation topic, paged to on-call within two minutes' is an action item.
One named owner per item. Not Team. Not DevOps. A person who is aware they own it.
Cap the list. A postmortem with eleven action items will complete zero of them. Three specific, owned items with dates will outperform eleven vague ones in every organisation that has tried both.

When engineers see that postmortems produce action items that never get done, they learn the postmortem is a ritual to survive, not something to engage with honestly. The next postmortem gets shallower. The monitoring gap that caused the original incident stays unfixed. The team responds slightly slower in the next incident because the institutional learning that should have happened did not.

The 90-second executive read

“Write the executive brief after everything else is done. Put it at the top. It should answer four questions in 200 words or fewer.”

— FlowVerify

What broke, in plain English, customer-first framing. Scope: number of users affected and for how long. Why: the underlying cause, not the proximate one. 'A configuration change that was not validated under campaign-level traffic' is the underlying cause. 'Replication lag' is the proximate one. What is different now: one to three specific changes, not aspirational statements.

'We will improve our testing practices' is not a change — it is an aspiration. 'We have added a load test against campaign-level traffic to the deployment checklist, and it runs in CI before every production deploy' is a change.

If you have written this brief and a non-engineer cannot understand what happened to customers from it, rewrite before publishing. The purpose of the executive brief is not to protect the engineering team. It is to give a stakeholder a complete picture in 90 seconds.

One more thing: who writes it

The engineer closest to the incident should not write the postmortem alone. They have too much context — the forensic version is the only version they can see clearly. A second engineer who was present but not leading the response, or an engineering manager, often catches where the document assumes knowledge the reader does not have.

Pair-writing the executive brief specifically is worth the twenty minutes it takes. One person writes a draft, another reads it aloud and flags where they need to stop and ask a question. Every place they stop is a sentence to rewrite.

The postmortem that actually gets read is not longer or more detailed than the forensic one. It is structured for three audiences who encounter it at different times with different questions. Separate the narrative from the forensics. Lead with customer impact. Write the executive brief last and put it first. Create action items as tickets on the day of publication. The analysis your team already does is usually the right analysis — the document is what needs redesigning.

Frequently asked questions

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Jul 5, 2026Read full article →

Best PracticesMay 16, 20265 min readReviewed May 16, 2026

Write a postmortem that someone outside your team will actually read

Most postmortems fail their readers not because the analysis is wrong, but because the document assumes one audience when it has three.

By FlowVerify Editorial Team

This is the default postmortem. It documents everything and changes nothing.

The forensic format is written for the people who were in the room

Most postmortem guides do not acknowledge this. They assume one reader, and it is the reader who was already there.

You have three readers, not one

Before writing the first sentence, name who will actually read the document.

These three readers want different things. A single 500-line document serves none of them well.

The two-document approach, in one document

The solution is not to write three separate documents. It is to structure one document so each reader can find what they need quickly and stop reading when they have it.

Rewriting the postmortem: before and after

The prose problem in most postmortems is chronological drift: the document follows the sequence of the incident rather than the sequence of understanding.

Section	Forensic version (written for the incident team)	Readable version (written for leadership and the future engineer)
Opening	At 14:31 UTC, the primary database experienced elevated replication lag, triggering an alert at 14:34 which was acknowledged by the on-call engineer.	For 47 minutes on 12 May, approximately 3,200 users received errors when placing orders.
Root cause	TRANSACTION_ISOLATION was set to READ_UNCOMMITTED in the replica config, causing dirty reads under concurrent write load.	A database configuration change from the previous week behaved correctly under normal load but failed under the write pattern produced by a marketing campaign.
Action item	Improve monitoring for database issues.	Add alert for write queue depth > 10,000 on the order-creation topic, paged to on-call within 2 minutes — @sre-team, due 30 May.

The same postmortem section, written two ways

Action items: where postmortems go to die

Three other constraints that matter:

Be specific. 'Improve monitoring' is not an action item. 'Add an alert for write queue depth above 10,000 messages on the order-creation topic, paged to on-call within two minutes' is an action item.
One named owner per item. Not Team. Not DevOps. A person who is aware they own it.
Cap the list. A postmortem with eleven action items will complete zero of them. Three specific, owned items with dates will outperform eleven vague ones in every organisation that has tried both.

The 90-second executive read

“Write the executive brief after everything else is done. Put it at the top. It should answer four questions in 200 words or fewer.”

— FlowVerify

Write a postmortem that someone outside your team will actually read

The forensic format is written for the people who were in the room

You have three readers, not one

The two-document approach, in one document

Rewriting the postmortem: before and after

Action items: where postmortems go to die

The 90-second executive read

One more thing: who writes it

Frequently asked questions

Related reading

Railway disconnected a carrier to contain an outage. It cut its last route instead.

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Stay ahead on eSignatures, compliance, and document workflows

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Write a postmortem that someone outside your team will actually read

The forensic format is written for the people who were in the room

You have three readers, not one

The two-document approach, in one document

Rewriting the postmortem: before and after

Action items: where postmortems go to die

The 90-second executive read

One more thing: who writes it

Frequently asked questions

Related reading

Railway disconnected a carrier to contain an outage. It cut its last route instead.

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Stay ahead on eSignatures, compliance, and document workflows

Railway disconnected a carrier to contain an outage. It cut its last route instead.