Idempotency keys done right: the schema, the race, and the zombie problem
A complete Postgres implementation covering five states, concurrent-request locking, crash recovery, and TTL cleanup
The Stripe engineering blog introduced most backend engineers to idempotency keys. The pattern is clean: the client attaches a unique key to each request; the server stores that key with the result; duplicate requests get the cached result instead of re-execution. Safe retries, no double charges.
What that post doesn't cover: the exact schema, what state a key is in while the request is processing, what happens when two identical requests arrive within the same millisecond, and what to do when the worker crashes halfway through. These are not edge cases. They happen in production before they happen in tests. Here is the complete implementation.
What an idempotency key does (and what it cannot do)
An idempotency key is a client-supplied identifier that scopes a request to a unique intent. The server uses it to cache the response. If the client retries with the same key, the server checks the cache first and returns the stored result rather than processing the request again.
Two things an idempotency key does not do. First, it does not make the underlying operation safe. If your payment code charges a card twice, an idempotency key won't prevent the second charge; it just caches whatever result your code produces. The operation itself still needs to be correct. Second, it does not handle concurrent requests well by default. The naive implementation (check whether the key exists, insert if not) has a race condition that surfaces in production under modest load.
The key belongs to the client. UUIDs (v4 or v7) are standard. The server treats the key as opaque: never generate one on behalf of a client, never reuse one across requests. Clients that lose their generated key and retry with a new one should expect to be charged or processed twice. That is correct behaviour from the server's perspective.
The schema: what each column earns
Most implementations start with two columns: the key and the cached response. The schema below has more, and each column justifies itself.
CREATE TABLE idempotency_keys (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
key text NOT NULL,
endpoint text NOT NULL,
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'processing', 'complete', 'failed')),
request_hash text NOT NULL,
response_code int,
response_body jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
locked_at timestamptz,
finished_at timestamptz,
expires_at timestamptz NOT NULL DEFAULT now() + INTERVAL '24 hours'
);
CREATE UNIQUE INDEX uq_idempotency_key_endpoint
ON idempotency_keys (key, endpoint);
CREATE INDEX idx_idempotency_cleanup
ON idempotency_keys (expires_at)
WHERE status IN ('complete', 'failed');
CREATE INDEX idx_idempotency_stuck
ON idempotency_keys (locked_at)
WHERE status = 'processing';The unique index is on (key, endpoint), not key alone. A key is meaningful only within the scope of a specific operation. A client that generates one UUID and reuses it on two different endpoints should get independent results for each, not a cross-endpoint collision. This matches how Stripe scopes keys to resource types.
request_hash is a SHA-256 (or HMAC-SHA-256) of the request body. If a duplicate key arrives with a different body, the server returns 422 Unprocessable Entity rather than silently replying with a cached result from a different call. Omitting this check is the most common mistake in idempotency implementations: a client bug reusing a key across different requests gets a silently wrong response.
locked_at tracks when a worker claimed the key. The partial index on locked_at WHERE status = 'processing' powers the zombie-key sweeper described later; it stays small and cheap because it only indexes actively-claimed rows.
expires_at is set at creation and controls eviction. It is separate from created_at because the cleanup job needs to target only completed records, not in-flight ones. They are different problems with different solutions.
Five states, not two
Treating idempotency as binary (exists or it doesn't) obscures the real state machine. An idempotency key moves through five distinct states, each with different semantics for what the server should return to the client.
| Status | Meaning | Next state | Client receives |
|---|---|---|---|
| pending | Key inserted; worker not yet claimed it | processing | 202 Accepted + Retry-After |
| processing | Worker has claimed the key and is running | complete or failed | 202 Accepted + Retry-After |
| complete | Operation finished; response stored | (terminal) | Cached response code and body |
| failed | Non-retryable error stored | (terminal) | Cached error response |
| expired | Past expires_at; eviction candidate | (terminal) | 404 Not Found or 410 Gone |
The key transition is pending to processing. A worker claims the key by running an UPDATE that sets status to 'processing' and stamps locked_at. This UPDATE is the lock: it prevents a second worker from double-processing the same key if two workers somehow both see the same pending row. After the operation finishes, one final UPDATE stores the response code and body, and sets the terminal status.
-- Claim the key before doing any work
UPDATE idempotency_keys
SET status = 'processing', locked_at = now()
WHERE id = $1
AND status = 'pending'
RETURNING *;
-- After work is done, store the result
UPDATE idempotency_keys
SET status = 'complete',
response_code = $2,
response_body = $3,
finished_at = now()
WHERE id = $1;When a client retries and the key is in pending or processing state, the server cannot return the cached result; it does not exist yet. Return 202 Accepted with a Retry-After header. Returning an error here would punish a client for doing the right thing.
The race you're probably not handling
Two retries arrive with the same key within the same millisecond, before either has been processed. Your handler reads the keys table. Neither finds the row, and both attempt to insert. One wins; the other gets a unique constraint violation and crashes the request handler. Your client receives a 500, which looks identical to a real server error.
The fix is to use the INSERT itself as the lock rather than a separate read:
INSERT INTO idempotency_keys
(key, endpoint, status, request_hash)
VALUES
($1, $2, 'pending', $3)
ON CONFLICT (key, endpoint)
DO UPDATE SET key = EXCLUDED.key -- no-op: activates RETURNING on conflict
RETURNING *;The DO UPDATE SET key = EXCLUDED.key clause is intentionally a no-op (updating key to its own conflict value changes nothing in the row). But it activates the RETURNING clause. With ON CONFLICT DO NOTHING, Postgres returns an empty result set for conflicting rows and RETURNING has nothing to work with. With DO UPDATE, RETURNING gives back the existing row after the (no-op) update.
This makes the INSERT atomic. The first request inserts and gets back a fresh pending row. The second request hits the conflict path and gets back the same row. Both can inspect the status field and respond appropriately: no second read, no race window, no constraint violation crash.
One consequence: the DO UPDATE path acquires a row-level lock until the transaction commits. If your operation takes 30 seconds, requests B, C, and D queue behind that lock for up to 30 seconds. For long-running operations, decouple the insert from the processing: insert synchronously, return 202, process asynchronously, let the client poll a status endpoint.
The zombie-key problem
The worker inserts the key as pending. It claims it by setting status to 'processing' and stamping locked_at. Then it dies: out of memory, a deploy restart, a network timeout that kills the goroutine mid-flight. The key is stuck in processing state. locked_at is 20 minutes ago. Nothing is running.
The next retry arrives and sees status = 'processing'. It has no way to tell whether the original worker is still running or long dead. Without a sweeper, the key is stuck forever. Every subsequent retry returns 202 Accepted, the client keeps polling, nothing finishes.
The sweeper resets zombie keys back to pending:
UPDATE idempotency_keys
SET status = 'pending', locked_at = NULL
WHERE status = 'processing'
AND locked_at < now() - INTERVAL '5 minutes'
RETURNING id, key, endpoint;This resets any key that has been in processing state for longer than the timeout threshold. The partial index on locked_at WHERE status = 'processing' makes this fast even on a table with millions of completed rows; it only scans actively-claimed rows.
Set the timeout to at least 2 to 3 times your 99th-percentile operation time, not the mean. If legitimate operations regularly take 4 minutes and the 99th percentile is 7 minutes, a 5-minute timeout will incorrectly reset live keys under load. Instrument your operation durations before picking the number.
Run the sweeper on a regular schedule; every 60 seconds works for most APIs. Log every key it resets, and track the count as a metric. Each reset is a signal that a worker crashed; the sweeper fixes the stuck state, but the crash is still worth investigating.
TTL cleanup without evicting the wrong keys
The naive cleanup query is appealing because it's short:
-- Don't do this: deletes in-flight and still-useful keys
DELETE FROM idempotency_keys
WHERE created_at < now() - INTERVAL '24 hours';It deletes pending and processing keys that are mid-flight. It also deletes completed keys before any late-arriving duplicate has had a chance to use the cached result. Neither outcome is what you want.
The correct approach separates two concerns: evicting completed records past their retention window, and expiring abandoned pending keys.
-- Target only terminal states past their retention window
DELETE FROM idempotency_keys
WHERE status IN ('complete', 'failed')
AND expires_at < now()
RETURNING id, key;
-- Separately, expire abandoned pending keys
UPDATE idempotency_keys
SET status = 'failed',
response_code = 408,
response_body = '{"error":"timeout"}'::jsonb,
finished_at = now()
WHERE status = 'pending'
AND expires_at < now();The expires_at column defines the cache lifetime. It is set at key creation. For most payment APIs, 24 hours covers the realistic retry window. For slow batch operations where clients might wait days before retrying, extend expires_at at completion time to match the documented retry window for that endpoint.
The idx_idempotency_cleanup partial index targets only complete and failed rows. The DELETE never scans pending or processing keys. On a table with 10 million rows, most of which are completed, this index keeps the cleanup query under 10 milliseconds and avoids a full table scan on every eviction run.
The second query converts old pending keys to a failed state before deletion. This preserves a terminal state for any client that still holds the key and polls after expiry. They see 'failed' with a timeout error rather than 404, which is more informative and easier to handle on the client side.
When Redis helps (and when it gets in the way)
The Redis pattern: before the Postgres INSERT, call SET idempotency:{key}:{endpoint} NX EX 86400. If the SET fails, the key exists in Redis, so short-circuit to the cached result. If it succeeds, proceed to Postgres. This adds a fast in-memory check that avoids a database round-trip for most duplicate requests at high volume.
The problem: you now have two systems that must agree. Redis TTL and Postgres expires_at need to stay in sync. If Redis evicts the key before Postgres does, a client gets a 404 from the fast path on one retry and the cached result from the slow path on the next. The behaviour is inconsistent from the same endpoint. If Redis is unavailable, you fall back to the Postgres path anyway, which means the Postgres implementation must be correct regardless of whether Redis is in front of it.
For most APIs, including those handling several thousand idempotent requests per second, the Postgres-only approach with a partial unique index is fast enough. The INSERT on a unique index is a single B-tree lookup plus a row write. The locked row contention only affects concurrent duplicates, not normal traffic. Measuring actual lock contention before adding Redis is the right call; adding it as a default is a reliability cost disguised as a performance optimisation.
Use Redis when you have measured Postgres index contention under load and it is a bottleneck. Not before.
Frequently asked questions
Related reading
Every Postgres isolation level, and the specific bug it lets through
Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Rate limiting in production: why the algorithm you chose is probably wrong for your workload
Most rate limiting failures aren't implementation errors. They come from picking an algorithm whose properties don't match the actual traffic shape. Here's a workload-first framework for making the right choice.