What's the difference between idempotency and exactly-once delivery?

Idempotency means the same operation can be applied multiple times without changing the result beyond the first application. Exactly-once delivery is a stronger guarantee at the messaging layer — a message is processed exactly once, not merely delivered once. Exactly-once delivery is hard to implement correctly in distributed systems. Idempotent operations are the practical alternative: design your operations so duplicate delivery doesn't matter, rather than trying to prevent duplicate delivery.

Can I use Redis instead of Postgres for the deduplication store?

Yes. Redis's SET NX (set if not exists) makes the atomic claim operation simpler than INSERT ... ON CONFLICT. The trade-off is durability: if Redis goes down, in-flight pending keys are lost. New requests proceed without deduplication until the store recovers. For teams already running Redis as a primary cache, the operational risk is often acceptable. For teams where Postgres is the only stateful dependency, keeping everything in one place reduces operational complexity.

How should clients generate idempotency keys?

Generate a UUID v4 at the point of user intent, not at the point of the network call. Store it client-side before the first request goes out. If the network times out, retry with the same UUID. If the user intentionally retries after fixing an error, generate a new UUID — that's a new logical intent, not a retry. Key reuse across distinct user intentions is the most common client-side mistake, and it causes legitimate operations to be silently dropped.

What happens when the operation fails partway through?

Set the key status to 'failed' and re-raise the exception. A second request for a failed key then has two options: re-execute the operation if the failure was transient and the operation is safe to retry, or return the original failure if the failure was permanent. For payment processing, re-execution on transient failures is usually correct. For validation errors, returning the cached failure is correct. Decide per endpoint based on whether the operation is safe to retry.

EngineeringJun 13, 20268 min readReviewed Jun 13, 2026

Idempotency keys in production: what the tutorials don't cover

The check-then-act race condition, deduplication table bottlenecks, and key scoping across services

By FlowVerify Editorial Team

An API endpoint is idempotent if making the same request twice produces the same result as making it once. For GET requests this is automatic. For POST requests (creating a payment, sending an email, provisioning a resource), it isn't. Idempotency keys are the mechanism that makes them so: the client generates a unique key for each logical operation, sends it with the request, and the server uses it to deduplicate retries.

The pattern looks simple. It isn't. Most tutorial implementations are correct in the happy path and wrong in the three cases that matter most in production.

What idempotency keys actually guarantee

The textbook definition (same inputs, same outputs) understates what's required at the API layer. An idempotent endpoint must also survive the case where the first request completed successfully but the client never received the response. Network partitions, load balancer timeouts, and client-side abort on slow response all produce this. The client cannot distinguish 'request never arrived' from 'request arrived, processed, response lost in transit'. It retries.

Idempotency keys give the server a way to recognise the retry and return the original result without re-executing the operation. The key is the client's assertion: this is the same logical operation as the one I sent before. The server's job is to honour that assertion efficiently and safely.

'Efficiently' is doing real work here. The server has to check whether the key exists, and in a concurrent system, that check has to be atomic with the claim. Most tutorials stop before explaining why.

The check-then-act race condition

Every tutorial implements idempotency like this:

Extract the key from the request header.
Query the deduplication store: does this key already exist?
If yes: return the cached response.
If no: execute the operation, store the result, return it.

The bug lives between steps 2 and 3. In concurrent systems, two requests carrying the same key can both reach step 2 before either completes step 4. Both see 'key doesn't exist'. Both execute the operation. The charge goes through twice.

This isn't a corner case. It's what happens under any exponential-backoff retry pattern when the first request is slow: the client gives up and retries, the server is still executing the first request, and both are now racing.

The fix requires making the claim atomic. Three practical options:

Option 1: unique constraint and conflict detection. Insert the key before executing the operation, using INSERT ... ON CONFLICT DO NOTHING. Check whether a row was actually inserted. If it was, this request owns the key and proceeds. If not, another request owns it: poll until it finishes and return its result.

claim.sql

-- Claim the key atomically before any work is done
INSERT INTO idempotency_keys (key, status, expires_at)
VALUES ($1, 'pending', now() + interval '30 minutes')
ON CONFLICT (key) DO NOTHING;

-- rows_affected = 0 means another request owns this key
-- poll for its result rather than proceeding

Option 2: SELECT FOR UPDATE on the key row. Locks the row exclusively so only one concurrent request proceeds through the check-and-execute path. Works reliably, but serialises all requests sharing a key. Acceptable when retries are rare and lock hold time is short.

Option 3: Postgres advisory locks. pg_try_advisory_xact_lock(key_hash) acquires a transaction-scoped lock on a 64-bit integer derived from the key. Fast, releases automatically on commit or rollback, does not require a row to exist first. The limitation is that advisory locks are connection-local. Behind PgBouncer in transaction pooling mode, the lock does not propagate — use option 1 instead.

Of the three, option 1 is the most portable. The unique constraint does the deduplication atomically at the database level, without relying on lock scoping or connection affinity.

The deduplication table as a second bottleneck

Once the race condition is addressed, the next problem is operational. The deduplication table grows without bound. At 5,000 requests per minute, that's 300,000 rows per hour, 7 million per day. Cleanup is not optional, and how you clean up matters as much as how you insert.

The naive fix is a periodic job: DELETE FROM idempotency_keys WHERE expires_at < now(). This works until you have millions of rows to delete. A bulk delete of 500,000 rows in a single transaction holds locks for seconds and leaves the vacuum process a large dead-tuple job, which generates I/O spikes during peak traffic.

Better approaches:

Time partitioning. Partition the table by created_at, daily or weekly. Dropping an old partition is a metadata operation: no lock, no vacuum, no dead tuples. At 5,000 req/min with 7-day retention, you maintain seven active daily partitions and drop the oldest each morning. The DROP command takes milliseconds.

Bounded deletes. If partitioning is too complex for your setup, run small frequent deletes: DELETE WHERE expires_at < now() LIMIT 1000, every minute. Small transactions minimise lock pressure and keep the dead-tuple volume manageable. Set the limit based on your write rate — the cleanup rate needs to stay above the expiry rate.

Keep the key column small. Store UUIDs (36 bytes) or hash them to a BIGINT. Do not put request payloads in the same column as the key index — store response data in a separate JSONB column with TOAST, or a separate table. A hot index over large values slows every write.

Strategy	Lock impact	Vacuum pressure	Operational complexity
Bulk DELETE	High (long transaction)	High (many dead tuples)	Low
Bounded DELETE (1 k rows/run)	Low (short transactions)	Moderate	Low
Time-partitioned table	None (DROP PARTITION)	None	Medium (partition mgmt)

Cleanup strategy trade-offs at scale

Key scoping across service boundaries

Single-service idempotency is straightforward. In a microservice architecture, a single user-facing request fans out to multiple downstream services, and the scoping question becomes harder.

Consider a checkout flow: the client sends one request with one idempotency key. The checkout service calls inventory to reserve stock, then calls payments to charge the card, then calls notifications to send a receipt. Which key goes where?

Do not propagate the raw key. If the inventory service and the payments service both receive the same key value, they deduplicate against different stores with different semantics. Worse: a client that accidentally reuses a key across distinct operations will see a request to payments match a key stored by inventory — a false positive deduplication that suppresses a legitimate charge.

Derive child keys deterministically. Compose the child key from the parent key and the service boundary:

child_keys.py

import hashlib

def child_key(parent_key: str, service: str) -> str:
    return hashlib.sha256(
        f"{parent_key}:{service}".encode()
    ).hexdigest()[:32]

# Each service gets a deterministic, globally unique key
inventory_key = child_key(request_key, "inventory:reserve")
payment_key   = child_key(request_key, "payment:charge")
notify_key    = child_key(request_key, "notification:receipt")

Each child key is globally unique to that operation type but fully deterministic from the parent. A retry of the parent request produces identical child keys. Each downstream service deduplicates independently, and the parent service does not need to coordinate across them.

The parent key covers the full fan-out. If the checkout service retries only the payments call because inventory already succeeded, it uses the same derived payment key. The payments service correctly recognises it as a duplicate and returns the cached result without re-charging.

Expiry windows and why 24 hours is not universal

Most tutorials suggest 24 to 72 hours as the idempotency key TTL. This is a starting point, not a derived value.

The right TTL is your retry window plus a buffer. If your client retries three times with exponential backoff that tops out at two minutes per attempt, the full retry sequence completes within about ten minutes. A 30-minute TTL covers it with headroom. A 24-hour TTL protects against a user submitting the same form a day apart — which is usually a different user intent, not a retry you want to deduplicate.

Overly long TTLs cause storage growth and can produce confusing failures. A user attempts a payment, gets a transient error, fixes their payment method, and retries 12 hours later with the same client-generated key. The server correctly deduplicates it and returns the original failure. The user sees a stale error rather than a fresh attempt. This is spec-compliant but wrong.

The fix is a client-side convention: generate a new key for each new logical attempt, not for each network call. If the user intentionally retries after fixing an error, the client generates a new key. If the network drops mid-request, the client retries with the same key. Most SDK implementations get this right; most hand-rolled implementations do not.

A two-tier TTL is worth considering: a short active window (15 to 30 minutes) for deduplication during retry sequences, and a longer audit window (7 days) for response caching to support debugging and support queries. Store them in separate columns with separate cleanup schedules — the active window deletes fast, the audit window stays for a week.

What to measure once it's running

A deduplication system you cannot observe is one you cannot trust. Four metrics worth instrumenting from day one:

Dedup rate: duplicate hits divided by total requests, as a percentage. A healthy system stays well under 1%. A sudden spike means client-side retry logic is misbehaving or a network layer is duplicating requests. This metric makes the misbehaviour visible before users notice a charge doubled.

Pending key age: P95 and P99 of how long a key stays in pending status. Under normal conditions, this should be low — seconds at most. Keys that stay pending for minutes indicate a stuck request or a cleanup failure. These are the rows that cause the polling path to wait indefinitely.

Table size: row count and bytes on disk. At a known write rate and TTL, you can project the expected steady-state size. If actuals exceed the projection, the cleanup job is falling behind. Set an alert before the table hits a size that will cause index scans to degrade.

Lock contention: if you're using SELECT FOR UPDATE, track wait time on that query. Near-zero under normal conditions. Elevated wait means concurrent retries are genuinely racing — understand the cause before assuming it's expected load.

Most APM tools will not instrument the deduplication table automatically. Add explicit counters at the application layer: increment idempotency.miss when a new key is claimed, idempotency.hit when a duplicate is caught, and idempotency.pending_timeout when the polling path gives up waiting. Those three counters surface 90% of production incidents before they reach users.

A production-ready idempotency key pattern

Pulling this together: a Postgres-backed implementation that handles the race condition atomically, cleans up without vacuum pressure, and emits the metrics you need.

schema.sql

CREATE TABLE idempotency_keys (
  key         TEXT        NOT NULL,
  status      TEXT        NOT NULL DEFAULT 'pending', -- pending | done | failed
  response    JSONB,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at  TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (key)
) PARTITION BY RANGE (created_at);

-- One partition per day; automate creation with pg_partman
CREATE TABLE idempotency_keys_2026_06_13
  PARTITION OF idempotency_keys
  FOR VALUES FROM ('2026-06-13') TO ('2026-06-14');

-- Index for the cleanup job
CREATE INDEX ON idempotency_keys (expires_at)
  WHERE status IN ('done', 'failed');

idempotency.py

import hashlib, json, asyncio
from datetime import datetime, timedelta, timezone

async def with_idempotency(db, key: str, ttl_minutes: int, operation):
    expires_at = datetime.now(timezone.utc) + timedelta(minutes=ttl_minutes)

    # Atomic claim: only one concurrent request wins this INSERT
    result = await db.execute("""
        INSERT INTO idempotency_keys (key, status, expires_at)
        VALUES ($1, 'pending', $2)
        ON CONFLICT (key) DO NOTHING
    """, key, expires_at)

    if result.rowcount == 0:
        # Another request owns this key. Poll for its result.
        return await poll_for_result(db, key, timeout_seconds=10)

    try:
        response = await operation()
        await db.execute("""
            UPDATE idempotency_keys
            SET status = 'done', response = $1
            WHERE key = $2
        """, json.dumps(response), key)
        metrics.increment('idempotency.miss')
        return response
    except Exception:
        await db.execute(
            "UPDATE idempotency_keys SET status = 'failed' WHERE key = $1",
            key
        )
        raise

async def poll_for_result(db, key: str, timeout_seconds: int):
    deadline = asyncio.get_event_loop().time() + timeout_seconds
    while asyncio.get_event_loop().time() < deadline:
        row = await db.fetchrow(
            "SELECT status, response FROM idempotency_keys WHERE key = $1",
            key
        )
        if row and row['status'] == 'done':
            metrics.increment('idempotency.hit')
            return json.loads(row['response'])
        if row and row['status'] == 'failed':
            raise IdempotencyKeyFailedError(key)
        await asyncio.sleep(0.5)
    metrics.increment('idempotency.pending_timeout')
    raise IdempotencyPendingTimeoutError(key)

The critical design decision: claim first, execute second. The unique constraint does the deduplication atomically. No manual locking, no application-level race. The polling path handles the window where the first request is still in flight.

Time partitioning handles cleanup without lock pressure. The cleanup job drops the previous day's partition each morning — no DELETE, no vacuum, no dead tuples. At moderate scale, up to several million keys per day, this table stays fast indefinitely.

The three metrics (miss, hit, pending_timeout) give you the signal you need to trust the system is working. The dedup rate tells you if clients are misbehaving. The pending timeout rate tells you if the first request is stuck. If both are near zero, the system is healthy.

Frequently asked questions

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Reddit moved 500+ Kafka brokers and a petabyte of live data from EC2 to Kubernetes with zero downtime. The three techniques behind it aren't specific to Kafka.

Jul 8, 2026Read full article →

EngineeringJun 13, 20268 min readReviewed Jun 13, 2026

Idempotency keys in production: what the tutorials don't cover

The check-then-act race condition, deduplication table bottlenecks, and key scoping across services

By FlowVerify Editorial Team

The pattern looks simple. It isn't. Most tutorial implementations are correct in the happy path and wrong in the three cases that matter most in production.

What idempotency keys actually guarantee

The check-then-act race condition

Every tutorial implements idempotency like this:

Extract the key from the request header.
Query the deduplication store: does this key already exist?
If yes: return the cached response.
If no: execute the operation, store the result, return it.

The fix requires making the claim atomic. Three practical options:

claim.sql

-- Claim the key atomically before any work is done
INSERT INTO idempotency_keys (key, status, expires_at)
VALUES ($1, 'pending', now() + interval '30 minutes')
ON CONFLICT (key) DO NOTHING;

-- rows_affected = 0 means another request owns this key
-- poll for its result rather than proceeding

Of the three, option 1 is the most portable. The unique constraint does the deduplication atomically at the database level, without relying on lock scoping or connection affinity.

The deduplication table as a second bottleneck

Better approaches:

Strategy	Lock impact	Vacuum pressure	Operational complexity
Bulk DELETE	High (long transaction)	High (many dead tuples)	Low
Bounded DELETE (1 k rows/run)	Low (short transactions)	Moderate	Low
Time-partitioned table	None (DROP PARTITION)	None	Medium (partition mgmt)

Cleanup strategy trade-offs at scale

Key scoping across service boundaries

Single-service idempotency is straightforward. In a microservice architecture, a single user-facing request fans out to multiple downstream services, and the scoping question becomes harder.

Derive child keys deterministically. Compose the child key from the parent key and the service boundary:

child_keys.py

import hashlib

def child_key(parent_key: str, service: str) -> str:
    return hashlib.sha256(
        f"{parent_key}:{service}".encode()
    ).hexdigest()[:32]

# Each service gets a deterministic, globally unique key
inventory_key = child_key(request_key, "inventory:reserve")
payment_key   = child_key(request_key, "payment:charge")
notify_key    = child_key(request_key, "notification:receipt")

Expiry windows and why 24 hours is not universal

Most tutorials suggest 24 to 72 hours as the idempotency key TTL. This is a starting point, not a derived value.

What to measure once it's running

A deduplication system you cannot observe is one you cannot trust. Four metrics worth instrumenting from day one:

A production-ready idempotency key pattern

Pulling this together: a Postgres-backed implementation that handles the race condition atomically, cleans up without vacuum pressure, and emits the metrics you need.

schema.sql

CREATE TABLE idempotency_keys (
  key         TEXT        NOT NULL,
  status      TEXT        NOT NULL DEFAULT 'pending', -- pending | done | failed
  response    JSONB,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at  TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (key)
) PARTITION BY RANGE (created_at);

-- One partition per day; automate creation with pg_partman
CREATE TABLE idempotency_keys_2026_06_13
  PARTITION OF idempotency_keys
  FOR VALUES FROM ('2026-06-13') TO ('2026-06-14');

-- Index for the cleanup job
CREATE INDEX ON idempotency_keys (expires_at)
  WHERE status IN ('done', 'failed');

idempotency.py

import hashlib, json, asyncio
from datetime import datetime, timedelta, timezone

async def with_idempotency(db, key: str, ttl_minutes: int, operation):
    expires_at = datetime.now(timezone.utc) + timedelta(minutes=ttl_minutes)

    # Atomic claim: only one concurrent request wins this INSERT
    result = await db.execute("""
        INSERT INTO idempotency_keys (key, status, expires_at)
        VALUES ($1, 'pending', $2)
        ON CONFLICT (key) DO NOTHING
    """, key, expires_at)

    if result.rowcount == 0:
        # Another request owns this key. Poll for its result.
        return await poll_for_result(db, key, timeout_seconds=10)

    try:
        response = await operation()
        await db.execute("""
            UPDATE idempotency_keys
            SET status = 'done', response = $1
            WHERE key = $2
        """, json.dumps(response), key)
        metrics.increment('idempotency.miss')
        return response
    except Exception:
        await db.execute(
            "UPDATE idempotency_keys SET status = 'failed' WHERE key = $1",
            key
        )
        raise

async def poll_for_result(db, key: str, timeout_seconds: int):
    deadline = asyncio.get_event_loop().time() + timeout_seconds
    while asyncio.get_event_loop().time() < deadline:
        row = await db.fetchrow(
            "SELECT status, response FROM idempotency_keys WHERE key = $1",
            key
        )
        if row and row['status'] == 'done':
            metrics.increment('idempotency.hit')
            return json.loads(row['response'])
        if row and row['status'] == 'failed':
            raise IdempotencyKeyFailedError(key)
        await asyncio.sleep(0.5)
    metrics.increment('idempotency.pending_timeout')
    raise IdempotencyPendingTimeoutError(key)

Idempotency keys in production: what the tutorials don't cover

What idempotency keys actually guarantee

The check-then-act race condition

The deduplication table as a second bottleneck

Key scoping across service boundaries

Expiry windows and why 24 hours is not universal

What to measure once it's running

A production-ready idempotency key pattern

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

CRDTs vs OT is a solved question in 2026. Where you draw the sync boundary is not.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Idempotency keys in production: what the tutorials don't cover

What idempotency keys actually guarantee

The check-then-act race condition

The deduplication table as a second bottleneck

Key scoping across service boundaries

Expiry windows and why 24 hours is not universal

What to measure once it's running

A production-ready idempotency key pattern

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

CRDTs vs OT is a solved question in 2026. Where you draw the sync boundary is not.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.