Idempotency keys: the layer you're protecting isn't the one that bites you
Five layers. Each has a different failure mode. Here is the map.
The standard idempotency tutorial covers a payment API scenario: your client generates a UUID, attaches it to the request as an Idempotency-Key header, and your server stores the result the first time it processes it. Any retry sends the same UUID, the server returns the cached response, and the charge doesn't land twice. The pattern is correct. For a synchronous HTTP handler with a single database write and no downstream calls, it's close to complete.
The problem: most production backends don't have that shape. A typical payment or notification flow receives an HTTP request, writes to a database, publishes a message to a queue, calls an external payment provider, and when something fails mid-way, runs compensating logic. Each of those steps can produce a duplicate operation. An idempotency key at the HTTP layer handles exactly one of them.
This is the map of all five layers, what breaks at each one, and what fix actually applies.
Layer 1: the HTTP key (the one everyone adds)
The HTTP layer key is well-understood: generate a UUID, include it in the Idempotency-Key header, do an atomic check on the server — if the key exists, return the cached response; if not, process and store. The things most tutorials cover correctly:
- Atomic storage. Use INSERT ... ON CONFLICT DO NOTHING or Redis SETNX, not SELECT then INSERT. A non-atomic check allows two concurrent requests with the same key to both proceed through.
- Full response storage. Store the HTTP status code and the response body, not a processed flag. Replaying means returning exactly the same response, not re-running the handler.
- TTL aligned to your retry window. Across payment processors and mobile clients, 24 hours is a reasonable floor. For multi-day workflows, align TTL per operation type to the longest retry window that operation might see.
One thing most tutorials understate: the client bears a real burden. The key must be generated and persisted before the first network call. If a mobile app generates the key in memory and then crashes before saving it to local storage, the retry arrives with a different key and the server processes it again. On iOS and Android, this is not an edge case — it's a common pattern in background sync and offline-first apps. Most SDK retry implementations handle the key correctly; most hand-rolled fetch wrappers do not.
A practical note on key format: UUID v4 (random) works fine for most uses. For high-volume APIs where idempotency keys are stored in a database, UUID v7 (time-ordered) performs better on insert because it doesn't fragment B-tree indexes. The difference matters at tens of thousands of requests per second; below that, it doesn't.
This layer is where most teams correctly invest effort. It's not where most production incidents originate.
Layer 2: the database write race condition
Your HTTP idempotency check passes and the request proceeds to write to your database. At this point, you're relying on the assumption that only one request is doing this write at a time. That assumption frequently breaks.
Consider a subscription activation flow. Your API receives a valid payment confirmation, the idempotency check passes, and the handler sets subscription_status to 'active'. At the same moment, a webhook from your payment processor also fires — a different HTTP request, with a different idempotency key, also valid, also authorized. Both requests arrive within milliseconds of each other. Neither is a retry in the HTTP sense. Both pass their respective idempotency checks. Both attempt to write the same state transition.
The fix is not another idempotency key. It's a database-level constraint: a conditional update that only proceeds if the row is in the expected state.
UPDATE subscriptions
SET status = 'active', activated_at = now(), version = version + 1
WHERE user_id = $1
AND status = 'pending'
AND version = $2;
-- If rowcount = 0, something else already ran this transition.
-- Check the current state and handle it explicitly.If this update returns zero rows affected, another request already ran the transition. The right response depends on your domain: sometimes you return success (the desired state was achieved), sometimes you return a conflict error. What you don't do is silently ignore the zero-row result — that's how partial updates slip through undetected.
This is an orthogonal guarantee to the HTTP layer key. The HTTP key says 'this request has been seen.' The database constraint says 'this state transition has already completed.' You need both, and neither substitutes for the other.
Layer 3: message queue consumers and at-least-once delivery
When your HTTP handler publishes a message to a queue — Kafka, SQS, RabbitMQ, Pub/Sub — you get at-least-once delivery semantics. The queue guarantees the message will be delivered at least once. If your consumer crashes after processing but before acknowledging, or if a consumer group rebalance happens at the wrong moment, the same message arrives again.
The idempotency key from the original HTTP request is not automatically in the message. Even if you include it, the consumer has to be written to check it. Most queue consumers are not idempotent by default — they're written to process each message as if it's the first time.
The standard fix: include a deterministic message-scoped identifier in the payload, and check it against a processed-messages table before doing any work.
-- Before processing the message, atomically claim it:
INSERT INTO processed_messages (message_id, processed_at)
VALUES ($1, now())
ON CONFLICT (message_id) DO NOTHING;
-- If rowcount = 0, another consumer instance already processed this message.
-- Acknowledge the message and return without doing work.There's a subtlety: this pattern has the same race condition as the HTTP layer. Two consumer instances reading the same message during a rebalance can both pass the 'have I seen this?' check before either has written to the dedup table. The insert-on-conflict pattern handles this because the database's unique constraint enforces the serialization. If both instances attempt the insert simultaneously, exactly one succeeds; the other gets zero rows and skips processing.
The failure mode when this layer is absent is specific: a customer gets charged once at the HTTP layer, but the order creation message is processed twice, resulting in two orders. The HTTP idempotency key is clean. The queue consumer is the gap.
Layer 4: external API calls and the unknown result
Your service calls an external payment processor. The request sends. Then your network connection drops, or the provider's response is lost in transit. You don't receive a response. You don't know if the charge went through.
Two things break in practice here, and they're worth naming separately.
First: many SDK retry implementations generate a new idempotency key for each attempt. This defeats the purpose. If you send the same charge to Stripe with two different idempotency keys, Stripe will process it twice. Check your payment SDK's retry configuration explicitly. Stripe's official Python library regenerates the key by default unless you supply one. The safe pattern: generate the key before making any call, store it alongside the charge record, and pass it explicitly on every attempt for that charge.
Second: the HTTP key on your own API and the idempotency key you pass to the payment provider are separate. Your system's idempotency key says 'I won't create two charge records.' The provider's idempotency key says 'I won't charge the card twice.' When you receive a timeout, your local charge record is in an indeterminate state. The provider's charge status is unknown.
The resolution path is not optional: after a timeout, query the provider for the charge by reference ID, check its status, and use that to resolve your local record. Teams that defer this to 'future work' end up resolving it manually during incidents. The reconciliation query should be a first-class part of your payment integration, with error handling that covers 'charge not found,' 'charge succeeded,' and 'charge failed' as distinct states.
Layer 5: saga compensation must be idempotent too
Sagas — sequences of local transactions coordinated without a distributed lock — are the standard pattern for workflows that span multiple services. If a step fails, the saga runs compensating transactions on the steps that already succeeded. The compensation for 'reserve inventory' is 'release inventory.' The compensation for 'create order' is 'cancel order.'
Compensating transactions must be idempotent. An orchestrator that retries a compensation step after a partial failure will call the same compensation twice. Releasing inventory twice means more inventory than you actually have. Cancelling an order that's already been cancelled can trigger a second cancellation notification to the customer.
The fix: scope idempotency to each saga step independently. A common approach is deriving a deterministic key from the saga instance ID and the step name:
import hashlib
def step_key(saga_id: str, step_name: str) -> str:
return hashlib.sha256(f"{saga_id}::{step_name}".encode()).hexdigest()[:32]
# Usage:
# compensation_key = step_key(saga.id, "release_inventory")
# Pass this key to the inventory service as its idempotency key.This key is stable across retries, unique per step, and derivable without storing additional state.
The deeper problem: compensation logic is usually written late in the development cycle, tested on the happy path only, and deployed once. Idempotency bugs in compensation surface during incidents, when the orchestrator is retrying because something went wrong and you're least equipped to debug it.
Test compensation paths explicitly. For each saga step, write a test that runs the compensation twice and asserts the downstream effect happens exactly once. This is harder to set up than unit tests; it requires real or realistic downstream services. But it's the only way to confirm the guarantee holds under retry conditions.
What 'done right' looks like across all five layers
Idempotency is not a feature you add to your API endpoint. It's a property you have to achieve independently at each layer, because each layer has a different failure mode and a different fix.
| Layer | What breaks without it | Canonical fix |
|---|---|---|
| HTTP API | Two concurrent requests with the same key both land a create | Atomic SETNX or INSERT ON CONFLICT; store full response with status code |
| Database write | Two valid operations race to update the same row | Conditional update with version check; unique constraint on the transition |
| Queue consumer | At-least-once delivery runs consumer twice; downstream effect doubles | Dedup table with unique message ID; atomic insert-on-conflict before processing |
| External API call | Timeout leaves result ambiguous; retry without stable key charges twice | Caller-controlled stable idempotency key; build the reconciliation query path |
| Saga compensation | Orchestrator retries compensation; compensating action applies twice | Step-scoped key derived from saga ID and step name; test compensation under retry |
None of these guarantees subsumes another. A team that correctly implements HTTP-layer idempotency and skips the queue consumer dedup will hit a production incident that the HTTP metrics won't show. A team that gets all five layers right for the happy path but skips saga compensation testing will discover the gap during an incident.
The reason idempotency bugs are hard to reproduce is specific timing at a specific layer. Fixing the wrong layer first is the most common response pattern — teams add an idempotency key to the API endpoint after a double-charge, when the actual duplicate was in the queue consumer that processed the payment confirmation event. The HTTP key is clean. The consumer is the gap.
Audit each layer independently. The five-minute version: for each layer in your flow, ask whether the same operation landing twice would produce a duplicate effect, and whether you have a mechanism that prevents that. If you're not sure, the answer is no.
Frequently asked questions
Related reading
Every Postgres isolation level, and the specific bug it lets through
Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Rate limiting in production: why the algorithm you chose is probably wrong for your workload
Most rate limiting failures aren't implementation errors. They come from picking an algorithm whose properties don't match the actual traffic shape. Here's a workload-first framework for making the right choice.