Idempotency keys done right: handling concurrent in-flight requests
Every guide covers the deduplication problem. Here is the harder one: two retries arriving while the first request is still processing.
A payment API receives a request. The client's network times out before it gets a response. Standard client behaviour: retry. Now two requests are in flight for the same operation, and if your server processes both, the customer is charged twice.
Idempotency keys are the established fix. The client generates a unique key per logical operation, sends it with every attempt, and the server uses it to detect retries and return the cached response. Every implementation guide covers this. Almost none cover the specific failure mode that breaks most production implementations: two retries arriving while the first request is still in flight.
Why a unique constraint is not enough
The first version most teams write looks like this:
def create_charge(idempotency_key, amount, customer_id):
# Check for a prior successful run
existing = db.query_one(
"SELECT response FROM idempotency_keys WHERE key = %s",
idempotency_key
)
if existing:
return existing["response"]
# Process the charge
result = stripe.charge(amount=amount, customer=customer_id)
# Store the result
db.execute(
"INSERT INTO idempotency_keys (key, response) VALUES (%s, %s)",
idempotency_key, json.dumps(result)
)
return resultThis has a classic time-of-check/time-of-use race condition. Two concurrent requests with the same key can both pass the SELECT before either inserts a row. Both hit Stripe. The customer is charged twice, and the second INSERT fails on a unique constraint — quietly, after the damage.
The standard fix to this specific race is to replace the SELECT + INSERT with an atomic INSERT ... ON CONFLICT DO NOTHING. If the INSERT returns a row, you own the key. If it returns nothing, the key already exists and you read the stored response. That eliminates the check-before-insert race.
But it introduces a different problem, which is the one that actually surfaces in production.
The case nobody talks about: two retries, one operation still in flight
Suppose a client sends a request. The server inserts the idempotency key row and begins processing: a multi-second external call to a payment processor, a document generator, or an email service. Before that call completes, the client's retry fires. A second request arrives with the same key.
With the atomic INSERT approach, the second request tries INSERT ... ON CONFLICT DO NOTHING, gets nothing back, then reads the existing row and finds no stored response. The operation is still in flight.
What should the server do? Most implementations either:
- Block the second request until the first completes, requiring an open connection and risking both requests timing out.
- Return a vague 500 or 503, which causes the client to retry again and possibly creates a third concurrent request.
- Re-execute the operation, which defeats the whole purpose and duplicates the side effect.
None of these are correct. The right answer is HTTP 409 Conflict with a body telling the client to retry after the in-flight operation settles.
“Most idempotency key guides stop at the deduplication problem. The concurrent in-flight problem is a separate failure mode, and it needs a separate mechanism.”
The state machine model
An idempotency key record is not a cache entry. It is a state machine with three states:
- pending — a request is actively processing this key. Do not re-execute.
- succeeded — the operation completed. Return the stored response.
- failed — the operation failed terminally. Return the stored error.
The pending state is what gives you a clean answer for the concurrent-request case. The second request reads pending, returns 409, and tells the client to wait and retry. When it retries, it will find either succeeded or failed and get the correct response with no duplicate execution.
This also handles the case where a client retry arrives after the operation succeeds but before the caller has received the response. The second request reads succeeded, returns the cached response, and the client is satisfied.
The idempotency keys schema
CREATE TABLE idempotency_keys (
key TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'succeeded', 'failed')),
request_hash TEXT NOT NULL,
response JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ NOT NULL DEFAULT now() + INTERVAL '24 hours',
CONSTRAINT idempotency_keys_pkey PRIMARY KEY (key)
);
CREATE INDEX ON idempotency_keys (expires_at);A few specific decisions worth calling out:
- request_hash is a SHA-256 of the request body, excluding the idempotency key itself. If a retry arrives with the same key but a different body, that is a client bug, not a retry. Return HTTP 422 immediately, before acquiring any lock.
- response is JSONB, not a serialised string. Store the response object directly. If you ever need to query fields inside stored responses (for debugging, for a dashboard, for audit), you will be glad this is not a blob.
- expires_at drives cleanup. 24 hours is right for most API use cases. Payment APIs that support manual retries days later should use 7 to 30 days.
- The primary key on key provides the unique constraint the atomic INSERT relies on.
The implementation: INSERT then SELECT FOR UPDATE
The pattern has two phases. First, try to atomically claim the key with INSERT ... ON CONFLICT DO NOTHING. If that succeeds, you own the key and proceed with execution. If it returns nothing, the key already exists — lock the row with SELECT FOR UPDATE and read its current state:
import hashlib, json
from psycopg2.extras import RealDictCursor
class ConflictError(Exception): pass
class StoredOperationError(Exception): pass
class RequestMismatchError(Exception): pass
def handle_idempotent(conn, key, request_body, operation):
request_hash = hashlib.sha256(
json.dumps(request_body, sort_keys=True).encode()
).hexdigest()
with conn.cursor(cursor_factory=RealDictCursor) as cur:
# Step 1: try to claim the key atomically
cur.execute(
"""
INSERT INTO idempotency_keys (key, request_hash, status)
VALUES (%s, %s, 'pending')
ON CONFLICT (key) DO NOTHING
RETURNING *
""",
(key, request_hash)
)
row = cur.fetchone()
conn.commit()
if row is None:
# Key exists -- lock the row and read current state
cur.execute(
"SELECT * FROM idempotency_keys WHERE key = %s FOR UPDATE",
(key,)
)
row = cur.fetchone()
if row["request_hash"] != request_hash:
conn.rollback()
raise RequestMismatchError(
"Idempotency key reused with different request body"
)
if row["status"] == "pending":
conn.rollback()
raise ConflictError(
"Concurrent request in progress -- retry in 1-2 seconds"
)
if row["status"] == "succeeded":
conn.rollback()
return row["response"]
if row["status"] == "failed":
conn.rollback()
raise StoredOperationError(row["response"]["error"])
conn.rollback()
# We own the key (inserted as pending). Run the operation outside
# any database transaction -- external calls cannot be transactional.
try:
result = operation()
except TerminalError as exc:
_mark_failed(conn, key, str(exc))
raise
_mark_succeeded(conn, key, result)
return result
def _mark_succeeded(conn, key, response):
with conn.cursor() as cur:
cur.execute(
"UPDATE idempotency_keys SET status = 'succeeded', response = %s WHERE key = %s",
(json.dumps(response), key)
)
conn.commit()
def _mark_failed(conn, key, error_message):
with conn.cursor() as cur:
cur.execute(
"UPDATE idempotency_keys SET status = 'failed', response = %s WHERE key = %s",
(json.dumps({"error": error_message}), key)
)
conn.commit()The FOR UPDATE on the SELECT is what handles the concurrent-retry case. If Request A holds the row in pending status and Request B calls SELECT FOR UPDATE, B waits until A commits its UPDATE to succeeded or failed. By the time B reads the row, the state is final and B returns the cached result.
One important detail: the INSERT commits immediately, before the external operation runs. The external call happens outside any database transaction. This is intentional: you cannot wrap a Stripe charge or a third-party API call in a Postgres transaction. The state machine bridges the gap between INSERT (pending) and UPDATE (succeeded or failed).
Recovering from stuck-pending keys
The fix is a TTL-based staleness check. Before returning 409 for a pending key, check whether the key has been in pending status longer than the operation's maximum expected duration. If it has, the original request is almost certainly dead. Reset the key to pending with a fresh timestamp and allow re-execution:
-- Recovery: reset a stale pending key, allowing a fresh attempt.
-- Run this before returning 409 for a key that has been pending too long.
UPDATE idempotency_keys
SET created_at = now(),
expires_at = now() + INTERVAL '24 hours'
WHERE key = $1
AND status = 'pending'
AND created_at < now() - INTERVAL '30 seconds'
RETURNING *;
-- If this returns a row, the key was stale and has been reset.
-- Treat the next request as a fresh claim.The staleness window (30 seconds above) should be the operation's 99th-percentile duration plus a margin. For a payment charge, 10 seconds is conservative. For a document generation workflow, you might need 2 to 3 minutes.
One prerequisite: the operation you are re-executing must itself be safe to retry. For external API calls, check whether the downstream service has its own idempotency mechanism and pass your key through to it. For services that do not, add a prior-state check before executing. If you cannot make the operation re-entry-safe, the stuck-pending TTL is your only automatic recovery path.
Idempotency keys in async queues
Everything above applies to HTTP APIs. Async job queues have the same failure mode: processing the same message twice, and the fix is slightly different because you do not control when consumers retry.
SQS and Kafka both deliver messages at least once. A consumer crash after processing but before acknowledging causes the broker to redeliver. If your consumer has side effects, those side effects repeat.
The state machine pattern works here too, with one change: the idempotency key is the message ID, not a client-supplied header. SQS assigns a MessageId to every message; Kafka uses the topic-partition-offset tuple as a natural unique identifier.
import boto3, json
sqs = boto3.client("sqs")
def process_messages(queue_url):
for message in sqs.receive_message(QueueUrl=queue_url)["Messages"]:
key = message["MessageId"]
body = json.loads(message["Body"])
try:
handle_idempotent(conn, key, body, lambda: do_work(body))
except ConflictError:
# Another worker is processing this message.
# Do not delete -- let visibility timeout expire for redelivery.
continue
except StoredOperationError:
# Terminal failure stored -- remove to stop redelivery.
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=message["ReceiptHandle"]
)
continue
# Succeeded (first run or cached) -- safe to delete.
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=message["ReceiptHandle"]
)The key difference from the HTTP case: a 409 (pending) in the queue context means leaving the message in the queue and letting it redeliver naturally. Do not delete a message whose key is in pending state. The visibility timeout acts as your retry delay.
For Kafka, the pattern is nearly identical but you manage offset commits yourself. Do not commit the offset until the idempotency key is in a terminal state. On consumer restart, Kafka replays from the last committed offset, and the idempotency key lookup short-circuits any already-completed messages.
What the IETF draft standardises
As of late 2025, the IETF has a public draft for a standard Idempotency-Key header (draft-ietf-httpapi-idempotency-key-header-07). It is worth knowing what it does and does not specify.
What it standardises: the header name (Idempotency-Key), the format (a quoted string of 1 to 255 characters), the requirement that servers document their retention window, and the expected behaviour on key collision (409 Conflict with a Retry-After header).
What it leaves to implementers: storage mechanism, state semantics, expiry policy, concurrent-request handling, and request-body validation. The standard does not define pending state, stuck-key recovery, or queue-layer idempotency. Those are engineering problems, not protocol problems.
For most teams, the practical implication is: if you implement the state machine described above, you are already compliant with the draft for HTTP APIs. Adding a Retry-After: 2 header to your 409 responses is the only non-obvious addition the draft requires.
Expiration and cleanup
Idempotency key rows accumulate quickly under load. Without cleanup, index scans slow down and table bloat compounds. The cleanup job is a periodic DELETE on expires_at:
-- Run on a schedule (every 15 minutes is sufficient for most workloads)
DELETE FROM idempotency_keys
WHERE expires_at < now();Run this every 10 to 15 minutes off the critical path. Also increase VACUUM frequency on this table — dead rows pile up faster than autovacuum's default schedule handles in high-throughput environments. If you are on Postgres 13 or later, configuring autovacuum_vacuum_insert_scale_factor to a smaller value for this table specifically reduces the lag between insert volume and vacuum cycles.
| Scenario | Key state after event | What the next request receives |
|---|---|---|
| Client retry after network timeout (operation already succeeded) | succeeded | Cached response; no re-execution |
| Two concurrent retries; first still in flight | pending (first owns it) | 409 Conflict; retry after 1-2 s |
| Server crashes after INSERT, before UPDATE | pending (stale) | 409 until staleness TTL; then re-execution |
| Same key, different request body | pending or succeeded | 422 Unprocessable; client bug surfaced immediately |
| Client retries a terminally failed operation | failed | Stored error; no retry attempted |
| SQS message redelivered after consumer crash | succeeded (if first run finished) | Short-circuit; message deleted; no duplicate work |
The state machine adds one row per operation and one round-trip for the claim. For most APIs, that overhead is negligible next to the cost of a duplicate charge, a duplicate email, or a corrupted ledger. The cases where it becomes a bottleneck — tens of thousands of idempotent requests per second to a single Postgres instance — are also the cases where you would be sharding your writes anyway.
Frequently asked questions
Related reading
Every Postgres isolation level, and the specific bug it lets through
Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Rate limiting in production: why the algorithm you chose is probably wrong for your workload
Most rate limiting failures aren't implementation errors. They come from picking an algorithm whose properties don't match the actual traffic shape. Here's a workload-first framework for making the right choice.