eSign webhook integration: five failure modes that don’t appear in vendor docs
What breaks silently in production — and the receive-endpoint checklist that catches it before you ship
When a payment webhook fails, the customer gets a delayed notification. When an eSign webhook fails, contracts end up in mismatched states across systems: marked completed in the CRM while still pending in the signing platform, post-signing workflows triggered twice, compliance records that contradict the audit trail.
eSign webhook integration is a different category of problem from generic API webhooks. The vendor docs cover how to register an endpoint, what events you’ll receive, and how to verify the signature. They don’t cover the five failure modes that only surface in production. Here is what actually breaks, how to detect each one, and the receive-endpoint checklist that catches them before you ship.
Why eSign webhook integration breaks differently from generic APIs
Generic webhooks fire after informational events: a payment settled, a form submitted. eSign webhooks fire after legally significant state transitions: a document sent, viewed, signed, completed, declined. That distinction changes the failure economics.
Providers treat eSign events as important enough to retry delivery for hours or days, not minutes. That extended retry window is a compliance decision. Event delivery needs to be as reliable as the signing act itself. The side effect is that any endpoint outage becomes a multi-day deduplication problem. Your handler will receive the same event multiple times. That is not a bug in the provider; it is the correct behaviour given the retry contract.
The other difference is that eSign events trigger real-world downstream actions. A ‘completed’ event might release a countersigned PDF to a signatory, mark a deal closed in a CRM, or trigger a payment. Duplicate delivery of that event doesn’t just produce duplicate records — it produces duplicate actions that are expensive to unwind. Both facts point to the same design requirement: treat every incoming event as if it might arrive more than once, in arbitrary order, with a signature you need to verify before you parse.
Failure mode 1: HMAC verification runs after the body is consumed
eSign providers sign their webhook payloads with HMAC-SHA256. FlowVerify sends the signature in the X-FlowVerify-Signature header; other providers use similar headers. The signature is computed over the raw request body bytes: the exact octets that arrived over the wire, before any parsing or re-serialisation.
The trap: web frameworks process the body before your route handler runs. In Express.js, express.json() reads the entire body stream, parses it into a JavaScript object, and sets req.body. By the time your verification middleware executes, the raw bytes are gone unless you explicitly saved them. Your code then computes HMAC over req.body: either undefined, or a re-stringified object with different whitespace and key ordering than the original. The computed hash never matches the signature in the header.
The two outcomes are both bad. Most teams either skip signature verification entirely after seeing it always return false, or wire the check incorrectly and end up with a handler that accepts any POST request regardless of origin. Either way, the security control is gone.
// Save raw body during parsing, before any route handler runs
app.use(express.json({
verify: (req, _res, buf) => {
req.rawBody = buf; // Buffer, not string
}
}));
// Verification middleware — must run before route handler
app.use('/webhook', (req, res, next) => {
const signature = req.headers['x-flowverify-signature'];
const computed = crypto
.createHmac('sha256', process.env.WEBHOOK_SECRET)
.update(req.rawBody) // raw Buffer, not req.body
.digest('hex');
if (!crypto.timingSafeEqual(Buffer.from(computed), Buffer.from(signature))) {
return res.status(400).json({ error: 'Invalid signature' });
}
next();
});The timingSafeEqual comparison is not cosmetic. A naive string comparison leaks information about where two strings diverge through timing differences, which can be exploited to forge signatures incrementally. Use the constant-time comparison.
Failure mode 2: Slow handlers trigger retry storms
eSign providers expect a 200 response within a timeout window. If your endpoint doesn’t respond in time, the provider retries delivery. The problem is that a handler that responds in 200ms under normal load can breach that threshold under load, particularly if it does synchronous work before returning.
The failure cascade: your endpoint does a database write plus a downstream API call before returning 200. Under normal conditions this takes 600ms, well within the window. Under load, it takes 18 seconds. The provider doesn’t get a 200, marks the delivery failed, and retries after a backoff interval. The retry hits the same overloaded endpoint. Another timeout. Another retry. By the time load drops, you’ve received the same ‘signed’ event five times, and each delivery triggered a downstream action.
In an eSign context this is worse than it sounds. A ‘completed’ event might release a signed document to the counterparty, fire a notification to both parties, or close a deal stage. Running that sequence five times creates inconsistencies that require manual intervention to clean up.
The fix is architectural: separate receiving from processing. Your webhook endpoint does exactly two things: verify the signature, then enqueue the raw payload. Everything slow goes into the background worker.
from fastapi import FastAPI, Request, Response, BackgroundTasks
from app.queue import enqueue
from app.security import verify_signature
app = FastAPI()
@app.post('/webhook')
async def receive_webhook(request: Request, background_tasks: BackgroundTasks):
payload = await request.body()
verify_signature(payload, dict(request.headers)) # fast and synchronous
background_tasks.add_task(process_event, payload) # non-blocking
return Response(status_code=200) # 200 before any downstream work
async def process_event(payload: bytes):
# All slow operations happen here, after the 200 is already sent
event = json.loads(payload)
await deduplicate(event['eventId']) # check + insert before processing
await update_crm(event)
await notify_parties(event)Failure mode 3: Events arrive out of order
An envelope progresses through states in a defined sequence: sent, delivered, viewed, signed, completed. The provider generates events in that order. It does not guarantee delivery in that order.
The scenario: ‘signed’ has a transient delivery failure on first attempt. ‘Completed’ delivers successfully shortly after. Your handler processes ‘completed’ first. If your state machine uses delivery order to track document status, the document transitions directly from ‘viewed’ to ‘completed’, skipping ‘signed’ entirely. Depending on downstream logic, this can mean a document that triggers completion actions without ever recording a signature event.
The failure is subtle because it only manifests when one event in a sequence had a retry and others didn’t. That scenario is most likely during peak load, which is precisely when you’d least want it.
The fix: don’t use delivery order. Every eSign event payload includes a timestamp of when the event occurred on the provider’s side. Use that field for sequencing, not the order in which your handler received the events. Design your state machine to handle out-of-order transitions without corrupting document state: either queue events until their predecessor arrives, or accept any terminal state transition regardless of intermediate steps and backfill the missing states.
A state machine that can only process events in the expected sequence is a state machine that will fail in production. Assume arbitrary delivery order from the start and design accordingly.
Failure mode 4: You are responsible for deduplication
Most eSign providers include a unique identifier per event. FlowVerify includes an eventId field in every payload and sends the same value in an Idempotency-Key header across all retries for that event. Other providers use similar patterns. What providers generally don’t include is a guarantee that a given event will only be delivered once.
With retry windows spanning days, any endpoint downtime during that window results in duplicate deliveries when the endpoint recovers. The event IDs are consistent across retries, so deduplication is possible, but you have to implement it. The provider hands you the key; the lock is yours to build.
The pattern: store delivered event IDs in a table with a unique constraint. On each incoming event, attempt to insert before processing. If the insert fails on the uniqueness constraint, the event is a duplicate. Return 200 and stop. Do not return 4xx on a duplicate; that would trigger another retry, which defeats the deduplication.
-- Deduplication table with a TTL-friendly structure
CREATE TABLE processed_webhook_events (
event_id TEXT PRIMARY KEY,
envelope_id TEXT NOT NULL,
event_type TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- In your handler: attempt insert before any processing
INSERT INTO processed_webhook_events (event_id, envelope_id, event_type)
VALUES ($1, $2, $3)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;
-- If the query returns no rows, it was a duplicate. Return 200, skip processing.Add a housekeeping job to delete rows older than the provider’s maximum retry window. For a 72-hour retry window, 7 days of retention is conservative. Beyond that, rows consume space without providing any deduplication value, since the provider will no longer retry events that old.
One edge case worth covering: if you use the Idempotency-Key header as your deduplication key instead of the payload’s eventId, make sure the header is present before you read it. FlowVerify includes it consistently; build a fallback to eventId for other providers that may only send it on retries.
Failure mode 5: Webhook registrations go stale
This failure mode is the hardest to detect because it produces no errors. Webhook delivery stops. The provider’s configuration dashboard still shows the registration as active. Events are generated on the provider’s side. Nothing arrives.
Two common triggers. The first is SSL certificate renewal. If the provider validates the TLS certificate at registration time, a certificate change, even a routine renewal, can cause the provider’s delivery system to reject the connection on its next attempt. No error is surfaced on either side. Events stop arriving.
The second trigger is credential rotation. Some providers tie webhook registrations to the API credentials used to create them. Rotating an API key or OAuth token can orphan the registration. Again: no error, no notification. The configuration looks healthy. Events stop.
Active monitoring is the only reliable countermeasure. The provider’s ‘active’ status tells you the registration exists; it does not tell you events are actually being delivered. Track ingestion rate in your own system. If your platform normally processes 50 events a day and yesterday’s count was zero, that warrants investigation, even on a slow day.
-- Check event ingestion rate for the past 24 hours
SELECT
COUNT(*) AS events_received,
MAX(processed_at) AS last_event_at
FROM processed_webhook_events
WHERE processed_at > NOW() - INTERVAL '24 hours';
-- Wire this into a monitoring check or a scheduled alert.
-- Alert if events_received = 0 during hours when activity is expected.Include webhook re-registration as a step in your certificate renewal runbook and credential rotation procedure. Treat it the same way you treat updating a secrets manager entry: it is a deployment action, not an afterthought.
What FlowVerify does with webhook delivery
Every FlowVerify webhook event includes an eventId that stays consistent across retries for the same event. The payload is signed with HMAC-SHA256; the signature arrives in the X-FlowVerify-Signature header, computed over the raw JSON body. An Idempotency-Key header carries the same value as the eventId, so you can use either as your deduplication key, whichever is easier to extract in your stack.
Retries use exponential backoff with a 72-hour maximum window. Each event payload includes an occurredAt field in ISO 8601 format. Use that for event ordering, not the time your handler received the delivery.
Webhook delivery status per envelope is visible in the FlowVerify dashboard under the audit trail. If your endpoint is returning errors, the dashboard shows which HTTP status codes it returned and at which delivery attempt. This makes it significantly faster to diagnose which of the five failure modes above is responsible. The log shows the pattern of 400s (HMAC problem), 200s that still produced duplicates (deduplication gap), or a clean delivery history with no events during an outage window (stale registration).
| Failure mode | How common | Visible in logs? | Typical time to diagnose |
|---|---|---|---|
| HMAC after body parsing | Very common | No — check silently passes or is disabled | Hours to days |
| Retry storm from slow handler | Common under load | Yes — duplicate records appear | Minutes to hours |
| Out-of-order delivery | Occasional | No — manifests as impossible state | Days |
| Missing deduplication | Always needed | Yes — downstream actions fire twice | Minutes to hours |
| Stale webhook registration | Low frequency, high impact | No — zero events, no error | Days or never |
The receive-endpoint checklist
Work through each item before wiring this into production. Every item that’s unchecked is a failure mode waiting for a bad week to surface it.
Signature verification
- Raw body is saved before any JSON parsing middleware runs
- HMAC is computed against raw bytes, not the parsed object or a re-stringified version
- Comparison uses a constant-time function, not a plain string equality check
- Invalid signatures return 400 and log the failure; they never silently pass
Response timing
- Endpoint returns 200 before any downstream processing begins
- Database writes, API calls, and PDF operations happen in a background queue
- 200 is returned for confirmed duplicate events, not 4xx
Deduplication
- Incoming eventId is stored with a unique constraint before processing
- Duplicate inserts are caught and discarded without triggering downstream actions
- TTL on the deduplication store covers at least the provider’s maximum retry window
State machine ordering
- State transitions use the event’s occurredAt timestamp, not delivery sequence
- State machine accepts out-of-order events without entering an impossible state
Registration hygiene
- Alert exists for zero-event periods during hours when activity is expected
- Certificate renewal procedure includes webhook endpoint re-registration
- Credential rotation procedure includes webhook re-registration or token refresh
Pre-ship testing
- Provider test mode used to fire each event type the handler will receive in production
- Duplicate delivery tested and confirmed to return 200 with no side effects
- Out-of-order delivery tested against the state machine
If you are integrating with FlowVerify and hitting delivery issues, the per-envelope audit trail in the dashboard shows each delivery attempt with its HTTP status code and timestamp, which is usually enough to identify which item on this checklist was missed.
Frequently asked questions
Audit trails, AES-256 encryption, and IT Act-compliant signatures.
Read our security overviewRelated reading
Rate limiting in production: why the algorithm you chose is probably wrong for your workload
Most rate limiting failures aren't implementation errors. They come from picking an algorithm whose properties don't match the actual traffic shape. Here's a workload-first framework for making the right choice.
Idempotency keys: the layer you're protecting isn't the one that bites you
An Idempotency-Key header handles one of five layers where duplicates cause harm. Database writes, queue consumers, external API calls, and saga compensation each have failure modes the HTTP key doesn't cover.
Server-Sent Events vs WebSockets in 2026: when each actually wins
WebSockets are the wrong default for most real-time features. HTTP/2 changed the SSE economics years ago, enterprise proxies regularly break WebSocket upgrades, and SSE handles reconnection natively.