Redis writes at scale: what benchmarks don't capture
Three failure modes — AOF rewrites, expiry stampedes, and Cluster rebalancing — that only surface in production
Redis writes at scale expose failure modes the benchmark your team relied on never tested for. The benchmark said 1.2 million operations per second. Not wrong, but measuring under conditions that do not exist in production: empty keyspace, no persistence, no replicas, no key expiry. Production has all four.
Then someone fires 5,000 writes per second at a key cluster for six hours and starts seeing P99 latency spikes of 400ms from a system that is supposed to answer in under 5ms.
Here is what actually happens inside Redis when you write, and three specific failure modes that only appear once you are running at scale.
How Redis handles a write: the actual path
A write to Redis is not one operation. Depending on your configuration, it is three:
- The in-memory write: the key is set in the hash table. This is the fast part: O(1) for a simple SET, bounded by memory bandwidth. This is what the benchmark measures.
- The persistence write: if you are running AOF (Append Only File), the write is also appended to the AOF file on disk. If you are running RDB, it contributes to the in-memory snapshot buffer. If you are running both (the recommended production configuration), it does both. If you are running neither (the default if you have not touched the config), it does neither, and a restart loses everything.
- The replication write: if you have replicas, the write is sent to each one asynchronously. Nearly invisible in normal operation. Matters when a replica falls behind.
Steps 1 and 3 get attention. Step 2 is where the interesting failures live.
| Mode | What happens on write | Risk on crash | Latency impact |
|---|---|---|---|
| No persistence | Memory only | Lose everything since last restart | None |
| RDB only | Memory write; periodic disk snapshot via fork() | Lose all writes since last snapshot | Spike during fork() |
| AOF (everysec) | Memory + fsync to AOF file once per second | Lose up to 1 second of writes | Low baseline; pause during rewrite |
| AOF (always) | Memory + fsync on every individual write | Lose at most one write | High sustained latency |
| AOF + RDB | Both of the above | Minimal | Combined impact of both |
Which mode are you actually running? Most teams do not know without checking:
redis-cli CONFIG GET save # RDB snapshot schedule
redis-cli CONFIG GET appendonly # AOF enabled?
redis-cli CONFIG GET appendfsync # always, everysec, or no
redis-cli INFO persistence # current AOF size, rewrite statusIf appendonly is no and save is empty, you are running with no persistence. Every restart is a cold cache. A large fraction of teams discover this at 3am during an incident.
AOF rewrite: the pause that does not show up in latency graphs
The AOF file grows continuously. Every write appends a line. Left unchecked, it grows until disk fills. So Redis periodically rewrites the AOF file: it forks a child process, the child writes a compact version of the current keyspace to a new file, then atomically replaces the old AOF.
The fork itself is fast. What is not fast is what comes after: while the child is writing, the parent continues accepting writes, which are tracked in an in-memory buffer. When the child finishes, the parent applies that buffer to the new AOF file before the file swap. At high write rates, this buffer gets large. Applying it is blocking.
If you are writing 50,000 keys per second and an AOF rewrite takes 8 seconds (realistic for a 10GB keyspace), the buffer holds roughly 400,000 operations when the rewrite completes. Flushing it can take hundreds of milliseconds. During that time, all writes queue behind it.
Watch for this in INFO persistence:
aof_rewrite_in_progress:1
aof_current_size:2853123104
aof_base_size:142657843
aof_pending_rewrite:0When aof_rewrite_in_progress flips from 1 to 0, watch your latency graph. If you see unexplained 200-500ms P99 spikes uncorrelated with traffic, correlate them against rewrite completion times. This is invisible to most monitoring setups, which track request latency without referencing Redis internal events.
There is no cost-free fix, but three practical mitigations:
- Tune auto-aof-rewrite-percentage and auto-aof-rewrite-min-size to trigger rewrites during low-traffic windows rather than whenever the file doubles.
- Set no-appendfsync-on-rewrite yes to skip fsyncs during the rewrite phase, reducing buffer flush time at the cost of a slightly higher crash risk.
- Separate write-heavy keys from read-heavy keys onto different Redis instances so a write-heavy instance's rewrite cycle does not spike read latency.
The write-expiry stampede
The read-cache stampede is well documented: when a hot key expires, all readers simultaneously find a cache miss and rush the backend. The write-expiry variant is less discussed.
Consider a write-heavy counter: a per-user rate limit bucket, a rolling window aggregate, or a page-level view counter. A common implementation pattern:
GET key
if missing: SET key 0 EX 60
INCR keyAt low traffic, this works. At high traffic, the expiry creates a thundering-herd problem on the write path. When the key expires with 800 concurrent writers active, all 800 find the key missing, all 800 issue SET key 0, and INCR is now racing against a key that 800 processes are simultaneously resetting. The first few hundred INCRs hit the value just set; then the key expires and the cycle repeats. Your counters are garbage.
The correct pattern uses an atomic Lua script:
local current = redis.call('INCR', KEYS[1])
if current == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return currentThis ensures the expiry is set exactly once, by the writer who created the key, atomically. The Lua script executes as a single Redis command, so no client can interleave between the INCR and EXPIRE.
The scale-specific problem: at low throughput, the race window is small enough that your counter is slightly off but rarely resets mid-window. At 50,000 writes per second on a 60-second TTL key, the race is constant. The key resets dozens of times per cycle.
Redis Cluster rebalancing and write latency
Redis Cluster shards data across nodes using a 16,384-slot hash ring. When you add or remove nodes, slots migrate between them. During migration, writes to a migrating slot take a different path:
- Client sends a write to the source node.
- Source node checks whether the slot is migrating.
- If the key exists on source, the write proceeds as normal.
- If the key has already migrated to the destination, source returns an ASK redirect.
- Client re-issues ASKING plus the original command to the destination node, adding one extra round trip.
The ASK redirect adds a full network round trip to every write that hits a key that has migrated. If your client handles only MOVED redirects and not ASK, it will loop or fail. If it handles ASK correctly, affected writes take 2x the latency of a normal write.
In production, rebalancing typically takes minutes to hours depending on keyspace size and migration batch settings. During that window, P99 write latency can be 2-4x higher than baseline, depending on what fraction of your keyspace sits in migrating slots.
The diagnosis is simple: run redis-cli --cluster check during a latency event to see slot migration status. Mitigation requires planning rather than reaction:
- Rebalance during known-quiet windows, not reactively during traffic spikes.
- Use the --cluster-migration-barrier option to limit parallel key migrations and reduce the blast radius.
- Monitor the ASK redirect rate in your client library's metrics. A spike in ASK redirects is a direct signal of active slot migration.
What to reach for when Redis writes are the bottleneck
The most common piece of bad advice here is to replace Redis with Postgres. That is correct for a specific case: small-to-medium datasets, writes and reads from the same record, network latency to the database is not the bottleneck. It is wrong in most situations where you have actually hit Redis write limits.
The more useful frame is: what is the write pattern?
Write-heavy with time-series semantics
Metrics, counters, events: Redis Streams fits better than SET/INCR here. It is designed for append-only writes and has native consumer group semantics. For write rates above 100,000 events per second where you also need query, TimescaleDB or ClickHouse handle high-write-rate time series with better compression and without the AOF rewrite problem.
Write-heavy with strong consistency
Redis is not the right tool. The in-memory nature and async replication mean you will get split-brain scenarios under network partition. This is where you want a Raft-based store (etcd or TiKV) or Postgres with synchronous replication and connection pooling.
Write-heavy hot-key pattern
Many clients writing to the same key simultaneously: this is a data model problem more than a Redis problem. Redis handles roughly 200,000 writes per second to a single key before the event loop becomes the bottleneck. If you are at that limit, shard the key space: partition your counter into N sub-keys, write to key:hash(writer_id) % N, and aggregate on read. This is the approach Redis itself documents for hot-key scenarios.
The diagnostic question is not 'is Redis wrong?' It is 'which write pattern does my workload fit, and am I using the right Redis features for that pattern?' Most of the time, the answer is the wrong data structure or the wrong persistence mode rather than the wrong database.
The diagnostic checklist
If you are seeing unexplained write latency spikes in a Redis deployment, check these in order:
- Check your persistence mode first. Run redis-cli INFO persistence. If AOF is enabled, look at aof_rewrite_in_progress. Correlate rewrite completion times with latency spikes in your monitoring.
- Check the slowlog. redis-cli SLOWLOG GET 25. Any command over 10ms is a candidate for investigation.
- Check the ASK redirect rate in your client library's metrics. A sudden spike means active cluster rebalancing.
- Check write patterns for atomicity gaps. If you are doing GET then check then SET then INCR sequences, convert them to Lua scripts or atomic Redis commands such as SET key value NX EX seconds.
- Check key TTL distribution. If a large fraction of your write-heavy keys expire at the same clock minute (all set with EX 3600 at startup), you get synchronised stampedes every hour. Add jitter: EX followed by 3600 plus a random offset of up to 300 seconds.
- Check replica lag. redis-cli INFO replication, specifically the delta between master_repl_offset and slave_repl_offset. Significant lag means replicas are consuming primary write bandwidth.
Redis is genuinely fast. The in-memory write path is hard to beat for the right workload. The failure modes above are predictable — they emerge at scale because that is when persistence, replication, and cluster mechanics become the dominant cost rather than the memory operation itself. Know your persistence mode, audit your write patterns for atomicity, and plan cluster rebalancing as scheduled maintenance rather than a reactive emergency.
Frequently asked questions
Related reading
Feature flags in production: the lifecycle teams skip
Most teams have a system for adding feature flags. Almost none have a system for retiring them. Here is the full lifecycle: flag types, staleness detection, and the cleanup playbook.
Every Postgres isolation level, and the production bug it's designed to prevent
Most Postgres users never touch isolation levels — until a double-charge or an oversold booking forces the question. What each level allows, and the production bug that follows when you pick the wrong one.
Cloudflare R2 vs Amazon S3 vs Backblaze B2: what the pricing calculators don't show you
R2 and B2 both promise zero or near-zero egress. Whether that saves money depends on your access pattern, user geography, and how write-heavy your workload actually is.