Should idempotency keys be generated by the client or the server?

Always by the client. The key represents the client's intent to perform one specific logical operation. If the server generates it, a retry after a network timeout gets a different key each time and loses deduplication entirely. UUID v4 or KSUID are standard choices — unpredictable, not sequential.

What HTTP status code should a concurrent in-flight key return?

HTTP 409 Conflict with a Retry-After header (2 seconds is a reasonable default for most APIs). Not 503, not 500. Those trigger aggressive backoff in most HTTP clients. 409 with Retry-After tells the client exactly what happened and when to try again.

How long should idempotency keys be kept?

Match your client's maximum retry window plus a buffer. For most REST APIs, 24 hours covers all realistic retry scenarios. For payment APIs where a customer might request a manual retry days later, 7 to 30 days is more appropriate. Shorter retention means a smaller table and less cleanup overhead.

What if the underlying operation cannot be made safe to re-execute?

Then the stuck-pending TTL is your only automatic recovery path. If the server crashes mid-operation, the operation re-executes after the TTL expires. If the downstream side effects cannot be reversed or deduplicated, you need either an idempotency key passed through to the downstream service, or a manual reconciliation job that checks external state before allowing re-execution.

EngineeringMay 7, 20267 min readReviewed May 7, 2026

Idempotency keys done right: handling concurrent in-flight requests

Every guide covers the deduplication problem. Here is the harder one: two retries arriving while the first request is still processing.

By FlowVerify Editorial Team

A payment API receives a request. The client's network times out before it gets a response. Standard client behaviour: retry. Now two requests are in flight for the same operation, and if your server processes both, the customer is charged twice.

Idempotency keys are the established fix. The client generates a unique key per logical operation, sends it with every attempt, and the server uses it to detect retries and return the cached response. Every implementation guide covers this. Almost none cover the specific failure mode that breaks most production implementations: two retries arriving while the first request is still in flight.

Why a unique constraint is not enough

The first version most teams write looks like this:

naive.py

def create_charge(idempotency_key, amount, customer_id):
    # Check for a prior successful run
    existing = db.query_one(
        "SELECT response FROM idempotency_keys WHERE key = %s",
        idempotency_key
    )
    if existing:
        return existing["response"]

    # Process the charge
    result = stripe.charge(amount=amount, customer=customer_id)

    # Store the result
    db.execute(
        "INSERT INTO idempotency_keys (key, response) VALUES (%s, %s)",
        idempotency_key, json.dumps(result)
    )
    return result

This has a classic time-of-check/time-of-use race condition. Two concurrent requests with the same key can both pass the SELECT before either inserts a row. Both hit Stripe. The customer is charged twice, and the second INSERT fails on a unique constraint — quietly, after the damage.

The standard fix to this specific race is to replace the SELECT + INSERT with an atomic INSERT ... ON CONFLICT DO NOTHING. If the INSERT returns a row, you own the key. If it returns nothing, the key already exists and you read the stored response. That eliminates the check-before-insert race.

But it introduces a different problem, which is the one that actually surfaces in production.

The case nobody talks about: two retries, one operation still in flight

Suppose a client sends a request. The server inserts the idempotency key row and begins processing: a multi-second external call to a payment processor, a document generator, or an email service. Before that call completes, the client's retry fires. A second request arrives with the same key.

With the atomic INSERT approach, the second request tries INSERT ... ON CONFLICT DO NOTHING, gets nothing back, then reads the existing row and finds no stored response. The operation is still in flight.

What should the server do? Most implementations either:

Block the second request until the first completes, requiring an open connection and risking both requests timing out.
Return a vague 500 or 503, which causes the client to retry again and possibly creates a third concurrent request.
Re-execute the operation, which defeats the whole purpose and duplicates the side effect.

None of these are correct. The right answer is HTTP 409 Conflict with a body telling the client to retry after the in-flight operation settles.

“Most idempotency key guides stop at the deduplication problem. The concurrent in-flight problem is a separate failure mode, and it needs a separate mechanism.”

The state machine model

An idempotency key record is not a cache entry. It is a state machine with three states:

pending — a request is actively processing this key. Do not re-execute.
succeeded — the operation completed. Return the stored response.
failed — the operation failed terminally. Return the stored error.

The pending state is what gives you a clean answer for the concurrent-request case. The second request reads pending, returns 409, and tells the client to wait and retry. When it retries, it will find either succeeded or failed and get the correct response with no duplicate execution.

This also handles the case where a client retry arrives after the operation succeeds but before the caller has received the response. The second request reads succeeded, returns the cached response, and the client is satisfied.

The idempotency keys schema

idempotency_keys.sql

CREATE TABLE idempotency_keys (
    key           TEXT        NOT NULL,
    status        TEXT        NOT NULL DEFAULT 'pending'
                  CHECK (status IN ('pending', 'succeeded', 'failed')),
    request_hash  TEXT        NOT NULL,
    response      JSONB,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at    TIMESTAMPTZ NOT NULL DEFAULT now() + INTERVAL '24 hours',
    CONSTRAINT idempotency_keys_pkey PRIMARY KEY (key)
);

CREATE INDEX ON idempotency_keys (expires_at);

A few specific decisions worth calling out:

request_hash is a SHA-256 of the request body, excluding the idempotency key itself. If a retry arrives with the same key but a different body, that is a client bug, not a retry. Return HTTP 422 immediately, before acquiring any lock.
response is JSONB, not a serialised string. Store the response object directly. If you ever need to query fields inside stored responses (for debugging, for a dashboard, for audit), you will be glad this is not a blob.
expires_at drives cleanup. 24 hours is right for most API use cases. Payment APIs that support manual retries days later should use 7 to 30 days.
The primary key on key provides the unique constraint the atomic INSERT relies on.

The implementation: INSERT then SELECT FOR UPDATE

The pattern has two phases. First, try to atomically claim the key with INSERT ... ON CONFLICT DO NOTHING. If that succeeds, you own the key and proceed with execution. If it returns nothing, the key already exists — lock the row with SELECT FOR UPDATE and read its current state:

idempotency.py

import hashlib, json
from psycopg2.extras import RealDictCursor

class ConflictError(Exception): pass
class StoredOperationError(Exception): pass
class RequestMismatchError(Exception): pass


def handle_idempotent(conn, key, request_body, operation):
    request_hash = hashlib.sha256(
        json.dumps(request_body, sort_keys=True).encode()
    ).hexdigest()

    with conn.cursor(cursor_factory=RealDictCursor) as cur:
        # Step 1: try to claim the key atomically
        cur.execute(
            """
            INSERT INTO idempotency_keys (key, request_hash, status)
            VALUES (%s, %s, 'pending')
            ON CONFLICT (key) DO NOTHING
            RETURNING *
            """,
            (key, request_hash)
        )
        row = cur.fetchone()
        conn.commit()

        if row is None:
            # Key exists -- lock the row and read current state
            cur.execute(
                "SELECT * FROM idempotency_keys WHERE key = %s FOR UPDATE",
                (key,)
            )
            row = cur.fetchone()

            if row["request_hash"] != request_hash:
                conn.rollback()
                raise RequestMismatchError(
                    "Idempotency key reused with different request body"
                )

            if row["status"] == "pending":
                conn.rollback()
                raise ConflictError(
                    "Concurrent request in progress -- retry in 1-2 seconds"
                )

            if row["status"] == "succeeded":
                conn.rollback()
                return row["response"]

            if row["status"] == "failed":
                conn.rollback()
                raise StoredOperationError(row["response"]["error"])

            conn.rollback()

    # We own the key (inserted as pending). Run the operation outside
    # any database transaction -- external calls cannot be transactional.
    try:
        result = operation()
    except TerminalError as exc:
        _mark_failed(conn, key, str(exc))
        raise

    _mark_succeeded(conn, key, result)
    return result


def _mark_succeeded(conn, key, response):
    with conn.cursor() as cur:
        cur.execute(
            "UPDATE idempotency_keys SET status = 'succeeded', response = %s WHERE key = %s",
            (json.dumps(response), key)
        )
    conn.commit()


def _mark_failed(conn, key, error_message):
    with conn.cursor() as cur:
        cur.execute(
            "UPDATE idempotency_keys SET status = 'failed', response = %s WHERE key = %s",
            (json.dumps({"error": error_message}), key)
        )
    conn.commit()

The FOR UPDATE on the SELECT is what handles the concurrent-retry case. If Request A holds the row in pending status and Request B calls SELECT FOR UPDATE, B waits until A commits its UPDATE to succeeded or failed. By the time B reads the row, the state is final and B returns the cached result.

One important detail: the INSERT commits immediately, before the external operation runs. The external call happens outside any database transaction. This is intentional: you cannot wrap a Stripe charge or a third-party API call in a Postgres transaction. The state machine bridges the gap between INSERT (pending) and UPDATE (succeeded or failed).

Recovering from stuck-pending keys

The fix is a TTL-based staleness check. Before returning 409 for a pending key, check whether the key has been in pending status longer than the operation's maximum expected duration. If it has, the original request is almost certainly dead. Reset the key to pending with a fresh timestamp and allow re-execution:

recovery.sql

-- Recovery: reset a stale pending key, allowing a fresh attempt.
-- Run this before returning 409 for a key that has been pending too long.
UPDATE idempotency_keys
SET created_at = now(),
    expires_at = now() + INTERVAL '24 hours'
WHERE key = $1
  AND status = 'pending'
  AND created_at < now() - INTERVAL '30 seconds'
RETURNING *;

-- If this returns a row, the key was stale and has been reset.
-- Treat the next request as a fresh claim.

The staleness window (30 seconds above) should be the operation's 99th-percentile duration plus a margin. For a payment charge, 10 seconds is conservative. For a document generation workflow, you might need 2 to 3 minutes.

One prerequisite: the operation you are re-executing must itself be safe to retry. For external API calls, check whether the downstream service has its own idempotency mechanism and pass your key through to it. For services that do not, add a prior-state check before executing. If you cannot make the operation re-entry-safe, the stuck-pending TTL is your only automatic recovery path.

Idempotency keys in async queues

Everything above applies to HTTP APIs. Async job queues have the same failure mode: processing the same message twice, and the fix is slightly different because you do not control when consumers retry.

SQS and Kafka both deliver messages at least once. A consumer crash after processing but before acknowledging causes the broker to redeliver. If your consumer has side effects, those side effects repeat.

The state machine pattern works here too, with one change: the idempotency key is the message ID, not a client-supplied header. SQS assigns a MessageId to every message; Kafka uses the topic-partition-offset tuple as a natural unique identifier.

worker.py

import boto3, json

sqs = boto3.client("sqs")

def process_messages(queue_url):
    for message in sqs.receive_message(QueueUrl=queue_url)["Messages"]:
        key = message["MessageId"]
        body = json.loads(message["Body"])

        try:
            handle_idempotent(conn, key, body, lambda: do_work(body))
        except ConflictError:
            # Another worker is processing this message.
            # Do not delete -- let visibility timeout expire for redelivery.
            continue
        except StoredOperationError:
            # Terminal failure stored -- remove to stop redelivery.
            sqs.delete_message(
                QueueUrl=queue_url,
                ReceiptHandle=message["ReceiptHandle"]
            )
            continue

        # Succeeded (first run or cached) -- safe to delete.
        sqs.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=message["ReceiptHandle"]
        )

The key difference from the HTTP case: a 409 (pending) in the queue context means leaving the message in the queue and letting it redeliver naturally. Do not delete a message whose key is in pending state. The visibility timeout acts as your retry delay.

For Kafka, the pattern is nearly identical but you manage offset commits yourself. Do not commit the offset until the idempotency key is in a terminal state. On consumer restart, Kafka replays from the last committed offset, and the idempotency key lookup short-circuits any already-completed messages.

What the IETF draft standardises

As of late 2025, the IETF has a public draft for a standard Idempotency-Key header (draft-ietf-httpapi-idempotency-key-header-07). It is worth knowing what it does and does not specify.

What it standardises: the header name (Idempotency-Key), the format (a quoted string of 1 to 255 characters), the requirement that servers document their retention window, and the expected behaviour on key collision (409 Conflict with a Retry-After header).

What it leaves to implementers: storage mechanism, state semantics, expiry policy, concurrent-request handling, and request-body validation. The standard does not define pending state, stuck-key recovery, or queue-layer idempotency. Those are engineering problems, not protocol problems.

For most teams, the practical implication is: if you implement the state machine described above, you are already compliant with the draft for HTTP APIs. Adding a Retry-After: 2 header to your 409 responses is the only non-obvious addition the draft requires.

Expiration and cleanup

Idempotency key rows accumulate quickly under load. Without cleanup, index scans slow down and table bloat compounds. The cleanup job is a periodic DELETE on expires_at:

cleanup.sql

-- Run on a schedule (every 15 minutes is sufficient for most workloads)
DELETE FROM idempotency_keys
WHERE expires_at < now();

Run this every 10 to 15 minutes off the critical path. Also increase VACUUM frequency on this table — dead rows pile up faster than autovacuum's default schedule handles in high-throughput environments. If you are on Postgres 13 or later, configuring autovacuum_vacuum_insert_scale_factor to a smaller value for this table specifically reduces the lag between insert volume and vacuum cycles.

Scenario	Key state after event	What the next request receives
Client retry after network timeout (operation already succeeded)	succeeded	Cached response; no re-execution
Two concurrent retries; first still in flight	pending (first owns it)	409 Conflict; retry after 1-2 s
Server crashes after INSERT, before UPDATE	pending (stale)	409 until staleness TTL; then re-execution
Same key, different request body	pending or succeeded	422 Unprocessable; client bug surfaced immediately
Client retries a terminally failed operation	failed	Stored error; no retry attempted
SQS message redelivered after consumer crash	succeeded (if first run finished)	Short-circuit; message deleted; no duplicate work

How the state machine handles each failure scenario

The state machine adds one row per operation and one round-trip for the claim. For most APIs, that overhead is negligible next to the cost of a duplicate charge, a duplicate email, or a corrupted ledger. The cases where it becomes a bottleneck — tens of thousands of idempotent requests per second to a single Postgres instance — are also the cases where you would be sharding your writes anyway.

Frequently asked questions

Every Postgres isolation level, and the specific bug it lets through

Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.

May 15, 2026Read full article →

EngineeringMay 7, 20267 min readReviewed May 7, 2026

Idempotency keys done right: handling concurrent in-flight requests

Every guide covers the deduplication problem. Here is the harder one: two retries arriving while the first request is still processing.

By FlowVerify Editorial Team

Why a unique constraint is not enough

The first version most teams write looks like this:

naive.py

def create_charge(idempotency_key, amount, customer_id):
    # Check for a prior successful run
    existing = db.query_one(
        "SELECT response FROM idempotency_keys WHERE key = %s",
        idempotency_key
    )
    if existing:
        return existing["response"]

    # Process the charge
    result = stripe.charge(amount=amount, customer=customer_id)

    # Store the result
    db.execute(
        "INSERT INTO idempotency_keys (key, response) VALUES (%s, %s)",
        idempotency_key, json.dumps(result)
    )
    return result

But it introduces a different problem, which is the one that actually surfaces in production.

The case nobody talks about: two retries, one operation still in flight

What should the server do? Most implementations either:

Block the second request until the first completes, requiring an open connection and risking both requests timing out.
Return a vague 500 or 503, which causes the client to retry again and possibly creates a third concurrent request.
Re-execute the operation, which defeats the whole purpose and duplicates the side effect.

None of these are correct. The right answer is HTTP 409 Conflict with a body telling the client to retry after the in-flight operation settles.

“Most idempotency key guides stop at the deduplication problem. The concurrent in-flight problem is a separate failure mode, and it needs a separate mechanism.”

The state machine model

An idempotency key record is not a cache entry. It is a state machine with three states:

pending — a request is actively processing this key. Do not re-execute.
succeeded — the operation completed. Return the stored response.
failed — the operation failed terminally. Return the stored error.

The idempotency keys schema

idempotency_keys.sql

CREATE TABLE idempotency_keys (
    key           TEXT        NOT NULL,
    status        TEXT        NOT NULL DEFAULT 'pending'
                  CHECK (status IN ('pending', 'succeeded', 'failed')),
    request_hash  TEXT        NOT NULL,
    response      JSONB,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at    TIMESTAMPTZ NOT NULL DEFAULT now() + INTERVAL '24 hours',
    CONSTRAINT idempotency_keys_pkey PRIMARY KEY (key)
);

CREATE INDEX ON idempotency_keys (expires_at);

A few specific decisions worth calling out:

request_hash is a SHA-256 of the request body, excluding the idempotency key itself. If a retry arrives with the same key but a different body, that is a client bug, not a retry. Return HTTP 422 immediately, before acquiring any lock.
response is JSONB, not a serialised string. Store the response object directly. If you ever need to query fields inside stored responses (for debugging, for a dashboard, for audit), you will be glad this is not a blob.
expires_at drives cleanup. 24 hours is right for most API use cases. Payment APIs that support manual retries days later should use 7 to 30 days.
The primary key on key provides the unique constraint the atomic INSERT relies on.

The implementation: INSERT then SELECT FOR UPDATE

idempotency.py

import hashlib, json
from psycopg2.extras import RealDictCursor

class ConflictError(Exception): pass
class StoredOperationError(Exception): pass
class RequestMismatchError(Exception): pass


def handle_idempotent(conn, key, request_body, operation):
    request_hash = hashlib.sha256(
        json.dumps(request_body, sort_keys=True).encode()
    ).hexdigest()

    with conn.cursor(cursor_factory=RealDictCursor) as cur:
        # Step 1: try to claim the key atomically
        cur.execute(
            """
            INSERT INTO idempotency_keys (key, request_hash, status)
            VALUES (%s, %s, 'pending')
            ON CONFLICT (key) DO NOTHING
            RETURNING *
            """,
            (key, request_hash)
        )
        row = cur.fetchone()
        conn.commit()

        if row is None:
            # Key exists -- lock the row and read current state
            cur.execute(
                "SELECT * FROM idempotency_keys WHERE key = %s FOR UPDATE",
                (key,)
            )
            row = cur.fetchone()

            if row["request_hash"] != request_hash:
                conn.rollback()
                raise RequestMismatchError(
                    "Idempotency key reused with different request body"
                )

            if row["status"] == "pending":
                conn.rollback()
                raise ConflictError(
                    "Concurrent request in progress -- retry in 1-2 seconds"
                )

            if row["status"] == "succeeded":
                conn.rollback()
                return row["response"]

            if row["status"] == "failed":
                conn.rollback()
                raise StoredOperationError(row["response"]["error"])

            conn.rollback()

    # We own the key (inserted as pending). Run the operation outside
    # any database transaction -- external calls cannot be transactional.
    try:
        result = operation()
    except TerminalError as exc:
        _mark_failed(conn, key, str(exc))
        raise

    _mark_succeeded(conn, key, result)
    return result


def _mark_succeeded(conn, key, response):
    with conn.cursor() as cur:
        cur.execute(
            "UPDATE idempotency_keys SET status = 'succeeded', response = %s WHERE key = %s",
            (json.dumps(response), key)
        )
    conn.commit()


def _mark_failed(conn, key, error_message):
    with conn.cursor() as cur:
        cur.execute(
            "UPDATE idempotency_keys SET status = 'failed', response = %s WHERE key = %s",
            (json.dumps({"error": error_message}), key)
        )
    conn.commit()

Recovering from stuck-pending keys

recovery.sql

-- Recovery: reset a stale pending key, allowing a fresh attempt.
-- Run this before returning 409 for a key that has been pending too long.
UPDATE idempotency_keys
SET created_at = now(),
    expires_at = now() + INTERVAL '24 hours'
WHERE key = $1
  AND status = 'pending'
  AND created_at < now() - INTERVAL '30 seconds'
RETURNING *;

-- If this returns a row, the key was stale and has been reset.
-- Treat the next request as a fresh claim.

Idempotency keys in async queues

worker.py

import boto3, json

sqs = boto3.client("sqs")

def process_messages(queue_url):
    for message in sqs.receive_message(QueueUrl=queue_url)["Messages"]:
        key = message["MessageId"]
        body = json.loads(message["Body"])

        try:
            handle_idempotent(conn, key, body, lambda: do_work(body))
        except ConflictError:
            # Another worker is processing this message.
            # Do not delete -- let visibility timeout expire for redelivery.
            continue
        except StoredOperationError:
            # Terminal failure stored -- remove to stop redelivery.
            sqs.delete_message(
                QueueUrl=queue_url,
                ReceiptHandle=message["ReceiptHandle"]
            )
            continue

        # Succeeded (first run or cached) -- safe to delete.
        sqs.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=message["ReceiptHandle"]
        )

What the IETF draft standardises

As of late 2025, the IETF has a public draft for a standard Idempotency-Key header (draft-ietf-httpapi-idempotency-key-header-07). It is worth knowing what it does and does not specify.

Expiration and cleanup

Idempotency key rows accumulate quickly under load. Without cleanup, index scans slow down and table bloat compounds. The cleanup job is a periodic DELETE on expires_at:

cleanup.sql

-- Run on a schedule (every 15 minutes is sufficient for most workloads)
DELETE FROM idempotency_keys
WHERE expires_at < now();

Scenario	Key state after event	What the next request receives
Client retry after network timeout (operation already succeeded)	succeeded	Cached response; no re-execution
Two concurrent retries; first still in flight	pending (first owns it)	409 Conflict; retry after 1-2 s
Server crashes after INSERT, before UPDATE	pending (stale)	409 until staleness TTL; then re-execution
Same key, different request body	pending or succeeded	422 Unprocessable; client bug surfaced immediately
Client retries a terminally failed operation	failed	Stored error; no retry attempted
SQS message redelivered after consumer crash	succeeded (if first run finished)	Short-circuit; message deleted; no duplicate work

How the state machine handles each failure scenario

Idempotency keys done right: handling concurrent in-flight requests

Why a unique constraint is not enough

The case nobody talks about: two retries, one operation still in flight

The state machine model

The idempotency keys schema

The implementation: INSERT then SELECT FOR UPDATE

Recovering from stuck-pending keys

Idempotency keys in async queues

What the IETF draft standardises

Expiration and cleanup

Frequently asked questions

Related reading

Every Postgres isolation level, and the specific bug it lets through

LLM database access: the RBAC gap most teams don't see

Rate limiting in production: why the algorithm you chose is probably wrong for your workload

Stay ahead on eSignatures, compliance, and document workflows

Every Postgres isolation level, and the specific bug it lets through

Idempotency keys done right: handling concurrent in-flight requests

Why a unique constraint is not enough

The case nobody talks about: two retries, one operation still in flight

The state machine model

The idempotency keys schema

The implementation: INSERT then SELECT FOR UPDATE

Recovering from stuck-pending keys

Idempotency keys in async queues

What the IETF draft standardises

Expiration and cleanup

Frequently asked questions

Related reading

Every Postgres isolation level, and the specific bug it lets through

LLM database access: the RBAC gap most teams don't see

Rate limiting in production: why the algorithm you chose is probably wrong for your workload

Stay ahead on eSignatures, compliance, and document workflows

Every Postgres isolation level, and the specific bug it lets through