What is wrong with equal on-call rotation?

Equal rotation distributes paging time evenly but not actual burden. Engineers with deep system knowledge resolve incidents faster, do more follow-up work, and generate most of the alert improvement backlog — none of which appears in a standard rotation count.

How do you measure on-call load more fairly?

Track P95 resolution time per engineer, the ratio of noise pages to actionable pages per rotation, and post-incident follow-up work alongside standard rotation counts. These signals together reveal who is actually carrying the load.

Should you route all alerts to the engineers who know the system best?

Not permanently — that creates knowledge silos and burns out your most experienced engineers faster. The goal is to route based on system familiarity while actively spreading that knowledge, so the routing becomes more even as the team's expertise broadens.

EditorialMay 25, 20265 min readReviewed May 25, 2026

Your on-call rotation punishes the engineers who care most

Equal paging counts feel fair. They measure the wrong thing.

By FlowVerify Editorial Team

The rotation schedule looks balanced. Every engineer gets the same number of on-call weeks per quarter. The incident dashboard shows a tidy distribution. The engineering manager presents the numbers at the next planning meeting and calls it equitable.

It is not equitable. It is equal. These are different things.

What your on-call rotation metric actually measures

A rotation schedule tracks time on-call per engineer, incidents routed per engineer, and — if the team is diligent — incidents closed per engineer. It does not track: how long the engineer was actually awake, whether they understood the problem or simply restarted the service, what they did the following morning, or how much of the incident resolution required genuine system knowledge versus following a runbook.

These omissions are not details. They are where the burden actually lives.

The knowledge gap you are not routing around

When an alert fires at 2am, the paged engineer either built part of the affected system and resolves it in eleven minutes, or is reading the runbook for the first time and escalating within twenty because they are not confident what the metric is telling them. The rotation log records the same number of disrupted nights. The human experience is completely different.

High-knowledge engineers do not spiral during incidents. They wake up, read the alert, and already know the shape of the problem. They have seen this failure before, or something close to it. Lower-knowledge engineers start the incident with a set of open questions and work through them in real time, under time pressure, in the middle of the night. That is not a criticism of their skill. It is a description of what system-specific knowledge does and does not transfer. You cannot rotate institutional knowledge the same way you rotate names on a schedule.

The gap compounds over time. The engineer with deep context gets paged, resolves quickly, and goes back to sleep. The engineer without it gets paged, stays awake longer, escalates more, and — if the incident is complex — ends up involving the high-knowledge engineer anyway. At which point both engineers have been on-call, and only one of them appears in the rotation count.

Three things that never appear in the dashboard

Resolution quality is the first. Some engineers resolve incidents by restarting the service. Others resolve them by identifying the root cause, documenting it, and writing a note that the next engineer can actually use. The first approach costs twenty minutes and defers the next three incidents. The second costs two hours and prevents them. Both appear identically in the incident log: closed.

Follow-up work is the second. The engineers who care most open tickets after incidents. They fix the runbook. They add the alert label that would have made the 3am page more specific. They file the issue about the flaky external dependency. None of this is tracked in on-call rotation metrics. If the team makes resourcing decisions based on those metrics, this work is invisible — which means the people doing it are effectively doing untracked labour on top of their rotation count.

Responsiveness variation is the third. A twenty-second acknowledgment and a four-minute one represent meaningfully different states of mind. A resolution at 2:18am and one at 3:52am represent meaningfully different amounts of lost sleep. Engineers with high system familiarity tend to resolve faster. Engineers with less context are still paged at the same rate. Equal paging counts do not capture this.

Alert noise is not distributed equally either

Noisy alerting does not affect all engineers equally. It affects the engineers who actually look at the alerts.

If your on-call rotation produces 300 alerts per week and 180 of them fire between midnight and 6am, the engineer who clicks acknowledge and returns to sleep is in a different rotation than the one who reads the alert, checks whether it is a real signal, concludes it probably is not, and then lies awake for thirty minutes anyway. Both engineers are recorded as having been on-call. Only one of them was.

The team almost certainly knows the alerts are noisy. There is a backlog of alert improvements that has been there for several months. The engineers generating items in that backlog are the ones who actually read the alerts — which is to say, the same engineers who will be on-call again before the backlog is acted on.

What a different model looks like in practice

Three changes, in rough order of impact.

First: stop using rotation count as the primary fairness metric. Add at minimum two more signals: P95 resolution time per engineer across the last quarter, and the ratio of noise pages to actionable pages per engineer per rotation. These will show you quickly whether the rotation is actually balanced or just equally scheduled.

Second: route alerts based on system familiarity where it matters. If four engineers have deep knowledge of the billing system and ten do not, billing-related alerts should go to that group more often — not forever, but until knowledge spreads deliberately. This is not a permanent arrangement; it is an acknowledgment that your on-call policy should account for where the knowledge actually sits, not assume it is evenly distributed.

Third: make follow-up work visible. If an engineer spends three hours the morning after an incident improving the runbook, filing root-cause tickets, and adding alert context, those three hours should be tracked somewhere that influences how resourcing decisions are made. If they are not, you are hiding the true cost of on-call from the people responsible for staffing it.

Dimension	Equal rotation	Fair rotation
What is measured	Weeks on-call per engineer	Disrupted hours, resolution speed, follow-up work
Alert routing	Same queue for all engineers	Routed by system familiarity where stakes are high
Resolution quality	Not tracked; all closes look equal	Visible through runbook updates, ticket quality
Knowledge assumption	Engineers are interchangeable	System knowledge is unevenly distributed and matters
Follow-up work	Not counted as on-call burden	Counted; influences rotation frequency

Equal rotation vs fair rotation: the five dimensions that differ

What the burnout signal is actually telling you

When your most reliable engineers say they are exhausted and your dashboard shows equal rotation, the instinct is to look elsewhere. Sprint pace, maybe. A difficult cross-functional dynamic. Some personal situation. Equal numbers on a dashboard are reassuring. They suggest the system is working.

Check the P95 resolution time per engineer over the last six months. Look at who opens post-incident tickets. Count who has generated items in the alert improvement backlog. In most teams, these three checks point to the same cluster of people.

Those engineers are not complaining because they got unlucky with incident timing. They are carrying a structural load that the rotation metric cannot see.

“Equal rotation is a scheduling decision. Fair rotation is a system design problem.”

— FlowVerify

The policy is straightforward to write and implement. The design requires acknowledging that engineers are not interchangeable inputs in a rotation queue, that system knowledge concentrates rather than distributes evenly, and that the engineers most likely to flag the problem are the ones least likely to be taken seriously when the dashboard says everything is balanced.

Fix the metric first. The rotation will follow.

Frequently asked questions

Meta published a postmortem for its 2021 outage. Not for the ones in 2026.

Meta's Instagram breach traced to a basic authentication gap, not a sophisticated attack, after its Trust and Safety team lost half its staff to an AI reassignment. No public postmortem has followed.

Jun 26, 2026Read full article →

EditorialMay 25, 20265 min readReviewed May 25, 2026

Your on-call rotation punishes the engineers who care most

Equal paging counts feel fair. They measure the wrong thing.

By FlowVerify Editorial Team

It is not equitable. It is equal. These are different things.

What your on-call rotation metric actually measures

These omissions are not details. They are where the burden actually lives.

The knowledge gap you are not routing around

Three things that never appear in the dashboard

Alert noise is not distributed equally either

Noisy alerting does not affect all engineers equally. It affects the engineers who actually look at the alerts.

What a different model looks like in practice

Three changes, in rough order of impact.

Dimension	Equal rotation	Fair rotation
What is measured	Weeks on-call per engineer	Disrupted hours, resolution speed, follow-up work
Alert routing	Same queue for all engineers	Routed by system familiarity where stakes are high
Resolution quality	Not tracked; all closes look equal	Visible through runbook updates, ticket quality
Knowledge assumption	Engineers are interchangeable	System knowledge is unevenly distributed and matters
Follow-up work	Not counted as on-call burden	Counted; influences rotation frequency

Equal rotation vs fair rotation: the five dimensions that differ

What the burnout signal is actually telling you

Those engineers are not complaining because they got unlucky with incident timing. They are carrying a structural load that the rotation metric cannot see.

“Equal rotation is a scheduling decision. Fair rotation is a system design problem.”

— FlowVerify

Fix the metric first. The rotation will follow.

Your on-call rotation punishes the engineers who care most

What your on-call rotation metric actually measures

The knowledge gap you are not routing around

Three things that never appear in the dashboard

Alert noise is not distributed equally either

What a different model looks like in practice

What the burnout signal is actually telling you

Frequently asked questions

Related reading

Meta published a postmortem for its 2021 outage. Not for the ones in 2026.

The Agile Manifesto is 25. It was mostly right.

Where your engineers work matters less than whether they chose it

Stay ahead on eSignatures, compliance, and document workflows

Meta published a postmortem for its 2021 outage. Not for the ones in 2026.

Your on-call rotation punishes the engineers who care most

What your on-call rotation metric actually measures

The knowledge gap you are not routing around

Three things that never appear in the dashboard

Alert noise is not distributed equally either

What a different model looks like in practice

What the burnout signal is actually telling you

Frequently asked questions

Related reading

Meta published a postmortem for its 2021 outage. Not for the ones in 2026.

The Agile Manifesto is 25. It was mostly right.

Where your engineers work matters less than whether they chose it

Stay ahead on eSignatures, compliance, and document workflows

Meta published a postmortem for its 2021 outage. Not for the ones in 2026.