Every Postgres isolation level, and the production bug it's designed to prevent
A practical map of Read Committed, Repeatable Read, and Serializable — with the concurrency bug each one closes off
Most Postgres users have never changed the isolation level on a transaction. Read Committed, the default, ships with Postgres, works fine for most applications, and stays invisible until something goes wrong. Two concurrent payment requests both read a balance of 500, both check that it's sufficient, both proceed. The account ends the day at -200.
Isolation levels are one of those database topics that appears in documentation, gets mentioned in interviews, and then disappears from production code. Engineers reach for them only after an incident. This article takes a different approach: each isolation level, the concurrency anomaly it permits, and the specific production scenario where that anomaly becomes a real bug.
What isolation levels actually control
Postgres uses Multiversion Concurrency Control. When a row is updated, Postgres writes a new version of the row and keeps the old version visible to transactions that started before the update. Reads never block writes; writes never block reads. Each transaction works from a snapshot of the database. The isolation level determines which snapshot a transaction sees, and when that snapshot is taken.
The SQL standard defines four isolation levels by which concurrency phenomena they prevent: dirty reads, non-repeatable reads, phantom reads, and write skew. Postgres implements all four, treating Read Uncommitted as Read Committed in practice. Its Repeatable Read and Serializable implementations are stricter than the standard requires, which changes how you think about what each level prevents.
Read Committed: the default and what it silently permits
Each statement in a Read Committed transaction takes a fresh snapshot: the most recently committed data at the moment that statement begins. Not at transaction start: at statement start. Two SELECT statements in the same transaction, run seconds apart, can return different results if another transaction commits between them. This is the non-repeatable read anomaly.
It creates a class of bugs in check-then-act patterns: an application reads some state, makes a decision based on it, then acts. But the state has changed by the time the action runs, because the read and the write are two separate snapshots.
A concrete example: two concurrent charge requests targeting the same wallet.
-- Wallet balance: 500
-- Transaction A: deduct 400. Transaction B: deduct 300.
-- Application logic (both transactions, running concurrently):
BEGIN;
SELECT balance FROM wallets WHERE id = 7; -- Both see 500
-- Application checks: 500 >= deduction amount → both proceed
UPDATE wallets
SET balance = balance - :deduction_amount
WHERE id = 7;
COMMIT;
-- What actually happens if A commits first:
-- A's UPDATE runs on 500 → balance = 100. Committed.
-- B's UPDATE: Read Committed takes a fresh snapshot for this statement.
-- Current committed value is now 100.
-- 100 - 300 = -200. Committed.
-- Both charge requests succeeded. Balance is -200.The SELECT approved both transactions. The UPDATE applied to whatever the row held at the moment the UPDATE statement ran. For Transaction B, that was 100, not the 500 its application code had checked. The constraint (balance must not go negative) lived in application code, and application code saw stale data.
The standard fix within Read Committed is SELECT ... FOR UPDATE. This acquires a row-level lock on the selected rows, so Transaction B waits at the SELECT until Transaction A commits. B then re-reads balance = 100, the application rejects the 300 debit, and the invariant holds. A unique constraint or database-level CHECK constraint is often the stronger fix: the database enforces atomically what application code enforces sequentially.
Read Committed is the right level for the vast majority of application reads and writes. Standard CRUD (insert a record, update a field, delete a row) has no cross-statement invariant to protect. Problems arise when you introduce a check-then-act pattern without locking the rows you're checking.
Repeatable Read: a snapshot that holds across statements
Repeatable Read gives a transaction a single snapshot, taken at transaction start, that holds for every statement in the transaction. Two SELECTs run seconds apart return the same data, regardless of what other transactions commit between them. The non-repeatable read anomaly disappears.
Postgres's Repeatable Read is stronger than the SQL standard requires: it also prevents phantom reads. A phantom read occurs when a query returns a different set of rows between two executions because another transaction inserted or deleted matching rows. Under Postgres's MVCC-based Repeatable Read, this cannot happen: the snapshot is fixed at transaction start.
Repeatable Read also catches the wallet example from above: if both payment transactions run under Repeatable Read, Transaction B's UPDATE targets a row that Transaction A has already committed a change to. Postgres detects the conflict and aborts B with a serialization error, rather than silently producing a wrong result. B retries, sees balance = 100 in its new snapshot, and the application rejects the 300 debit.
Repeatable Read is the right choice for long-running reporting queries that must see a consistent view of the data, financial calculations that read the same rows more than once, and check-then-act patterns on a single row where you can't add SELECT FOR UPDATE. For check-then-act patterns across multiple rows, or where the constraint spans rows that different writers can modify, Repeatable Read has a gap.
Write skew: the anomaly Repeatable Read cannot catch
Write skew is the subtlest concurrency anomaly. It occurs when two transactions each read a set of rows, each make a decision based on what they read, and each write to a different row, leaving the database in a state that violates a business rule, but where neither transaction modified the other's written row. There is no conflicting update for Repeatable Read to detect.
A concrete example: a document approval workflow that requires at least one approver to be active before any can withdraw.
-- Business rule: at least one approver must remain active.
-- Two approvers (A and B) are both active. Both request withdrawal simultaneously.
-- Approver A's transaction:
BEGIN;
SELECT COUNT(*) FROM approvers WHERE active = true; -- returns 2
-- 2 > 1, safe to withdraw: proceed
UPDATE approvers SET active = false WHERE id = 'approver_a';
COMMIT;
-- Approver B's transaction (runs concurrently, same snapshot):
BEGIN;
SELECT COUNT(*) FROM approvers WHERE active = true; -- also returns 2
-- 2 > 1, safe to withdraw: proceed
UPDATE approvers SET active = false WHERE id = 'approver_b';
COMMIT;
-- Both committed. Zero approvers active. Business rule violated.
-- Neither UPDATE conflicted at the row level. Each wrote a different row.Both transactions read the same aggregate, both saw the constraint satisfied, both committed writes to different rows. Repeatable Read does not catch this: the writes don't conflict at the row level. The database has no information that the two decisions were logically dependent.
Write skew appears in on-call scheduling systems (minimum coverage rules), multi-line budget systems (total spend must not exceed limit, but each line item is a separate row), hotel and seat booking where a capacity constraint spans the booking table, and any pattern where a constraint is checked via an aggregate across rows that multiple concurrent writers can modify.
“Write skew doesn't look like a race condition. It looks like two correct transactions that happen to produce an impossible result.”
Serializable: SSI and what it costs
Serializable isolation guarantees that concurrent transactions execute as if they had run one after another in some serial order. Postgres implements this with Serializable Snapshot Isolation, an algorithm that tracks read/write dependencies between transactions and aborts any transaction that would produce a result incompatible with serial execution.
In the approval example: Transaction A reads the approver count (rows that Transaction B will write), and Transaction B reads the same count (rows that Transaction A will write). SSI detects this read/write dependency cycle and aborts one transaction with a serialization failure. The aborted transaction retries, finds one active approver, and the application rejects the withdrawal. The invariant is preserved without the application knowing anything about the other concurrent transaction.
The performance cost of SSI depends on the workload. For OLTP applications with low-to-moderate write contention, SSI overhead is small: additional bookkeeping, not extra locking. The older approach to Serializable (predicate locking) was prohibitively expensive; SSI is not. For workloads with high sustained write contention on the same rows, abort rates rise and retry overhead compounds. Measure the specific workload; don't assume the cost is prohibitive.
Serializable is the right choice for any transaction that enforces a cross-row constraint that can't be expressed as a database-level constraint. If you can put the invariant in a unique index, a check constraint, or a foreign key, do that first. The database enforces those atomically at all isolation levels. Serializable is for the invariants that are too complex for a single constraint.
Read Uncommitted: why Postgres skips it
Read Uncommitted is the lowest isolation level in the SQL standard. It permits dirty reads: reading uncommitted data from another transaction. If that transaction rolls back, the read data never existed in the database. Dirty reads are almost never useful and frequently dangerous; they expose data that may disappear.
Postgres treats Read Uncommitted as Read Committed. Setting SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED in Postgres has no effect. The engine silently upgrades the level. Dirty reads are not possible in Postgres, regardless of the isolation level set.
Choosing the right isolation level
| Isolation level | Snapshot scope | Anomalies still permitted | Reach for it when |
|---|---|---|---|
| Read Committed (default) | Per statement | Non-repeatable reads, phantom reads, write skew | Standard CRUD with no cross-statement invariants |
| Repeatable Read | Per transaction | Write skew (cross-row) | Consistent reads across a transaction; row-level check-then-act |
| Serializable | Per transaction | None (aborts transactions that would violate serial order) | Cross-row constraints; approval workflows; booking systems |
| Read Uncommitted | N/A | N/A (treated as Read Committed by Postgres) | Nowhere; Postgres ignores this setting |
The decision tree is shorter than most documentation implies. Start with Read Committed. If you have a check-then-act pattern on a single row, add SELECT FOR UPDATE. If the invariant spans multiple rows that multiple writers can modify, use Serializable and implement transaction retry. If you can encode the invariant as a database constraint, do that instead — it works at all isolation levels.
The gap between tests and production
Tests run sequentially. Production doesn't. The concurrency anomaly that breaks your payment flow won't appear in unit tests, won't appear in load tests running against a lightly loaded staging database, and won't appear in the first thousand production transactions. It appears at scale, under contention, in the window between the check and the act.
The right time to choose an isolation level is when you're writing the transaction and designing the invariant it protects, not during the incident that reveals you didn't. The question to ask for every transaction that involves a check: if two identical transactions run at the same moment, what does the database guarantee about the result?
Frequently asked questions
Related reading
Coinbase's AWS outage lasted 18 hours. The postmortem shows why multi-AZ didn't help.
A single AWS zone failure turned into an 18-hour Coinbase outage. The postmortem reveals two specific ways 'multi-AZ' architecture quietly wasn't, and how to check your own systems for the same gap.
BYO-DSC signing isn't a file upload. Here's what changed in 2021.
Most eSign platforms list “bring your own DSC” as a checkbox feature. Since 2021, the certificate it refers to usually can't be a file at all — and that changes the architecture, not just the paperwork.
Idempotency keys in production: what the tutorials don't cover
Most idempotency key implementations handle the happy path and fail in three specific ways: a race condition between check and claim, a dedup table bottleneck at scale, and key scoping that breaks in fan-out systems.