Why your Node.js streaming pipeline crashes under load (and how backpressure fixes it)
The buffer trap, how to escape it, and the three patterns that implement backpressure correctly
Every Node.js streaming pipeline has a hidden failure mode. When the consumer falls behind the producer, data accumulates in heap buffers. There is no warning. Memory grows silently until the process exceeds V8's heap limit and crashes with an OOM error. The log shows the heap was full. It doesn't say why it filled.
The quiet failure mode: when producers outrun consumers
The OOM crash is the symptom. The cause is a rate differential. The producer (reading a large file, streaming database rows, or forwarding tokens from an LLM API) generates chunks faster than the consumer can process them. The consumer might be writing to a compressed stream, proxying to a slow mobile client, or running an expensive transform. The gap between generation speed and processing speed accumulates in memory.
This failure mode appears in more places than engineers expect: CSV exports with slow database cursors, Kafka consumers whose processing step takes longer than the poll interval, AI token-streaming proxies where the LLM generates at inference speed and the client reads at mobile-network speed, log-tail endpoints that aggregate before forwarding. The crash arrives after minutes or hours of gradual growth. Engineers go looking for a memory leak. They add --max-old-space-size and the crash reappears later, with more memory used.
The root cause, a rate differential between producer and consumer, stays hidden because nothing in the crash log names it. This is the backpressure problem.
Why developers reach for buffers, and why that makes it worse
The intuitive fix is a buffer. If the consumer cannot keep up right now, give it more time by storing more data. Increase highWaterMark. Add an in-process queue. Allocate more heap.
This delays the crash. It doesn't prevent it.
A buffer has a finite capacity. If the producer consistently outpaces the consumer (not just in bursts, but in a sustained way), any buffer will eventually fill. You've bought time at the cost of memory. The crash arrives later with a much larger heap footprint.
The buffer trap is convincing in demos. Test environments usually run on fast machines on the same network where the consumer keeps up, the buffer never fills. The problem only surfaces under realistic conditions: slow clients, high concurrency, or large payloads under sustained load.
How backpressure works in Node.js streaming pipelines
Node.js streams implement backpressure through two signals. A Writable stream's write() method returns false when its internal buffer reaches the highWaterMark. This means: stop sending. The stream emits a drain event when it has flushed its buffer and is ready to accept more. A Readable stream can be paused, stopping chunk delivery until resumed.
The contract is straightforward: if write() returns false, pause the Readable. Wait for drain. Resume. Here is what that looks like without using pipe():
// Wrong: ignores the backpressure signal
readable.on('data', chunk => {
writable.write(chunk); // return value silently discarded
});
// Right: honour the signal
readable.on('data', chunk => {
const ok = writable.write(chunk);
if (!ok) {
readable.pause();
writable.once('drain', () => readable.resume());
}
});
// Best for a single producer -> consumer: pipe() does the above automatically
readable.pipe(writable);The pipe() method implements the pause/drain/resume contract automatically. Ignoring write()'s return value while using an event listener is the single most common backpressure mistake in Node.js code. The stream signals correctly; the code just doesn't listen.
The propagation problem: backpressure must travel upstream
A single producer piping to a single consumer is manageable. A multi-stage pipeline is harder.
Consider a realistic pipeline: database cursor to JSON transform to gzip compress to HTTP response. If the HTTP response is slow, the gzip stage needs to slow down. The gzip stage slowing down needs to signal the JSON transform to slow down. The JSON transform slowing down needs to signal the database cursor to slow down. Backpressure has to propagate all the way back to the source.
Manual pipe() chains don't reliably handle this across multiple stages. The correct approach is stream.pipeline(), available since Node.js 10:
const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);
// Backpressure propagates through every stage automatically.
// Error handling and cleanup also propagate -- no fd leaks on failure.
await pipe(
dbCursor,
new JsonTransform(),
zlib.createGzip(),
response
);pipeline() wires up error propagation and backpressure across every stage. It also cleans up all streams on error, which is where manual pipe() chains often leak file descriptors or sockets.
Transform streams that do async work introduce a subtlety. The _transform method receives a callback; calling that callback signals readiness for the next chunk. Calling the callback before async work completes (to get out of the way faster) breaks backpressure: the stream requests the next chunk before the current one is processed. Call the callback only after push() and only after all async work for that chunk is done.
Three patterns that implement backpressure correctly
Each pattern fits a different context. Choose based on what you're building, not by what seems simplest.
| Pattern | Best for | What it handles automatically |
|---|---|---|
| stream.pipeline() | Multi-stage transforms with cleanup requirements | Backpressure propagation, error handling, stream cleanup |
| Manual pause/resume + drain | Push-based protocols (WebSocket, SSE) reading from a Readable | Gives direct control when pipe() does not fit |
| Async iteration (for await...of) | Simple async transforms without explicit stream wiring | Natural backpressure via await; no pause/resume logic needed |
The pattern to avoid: converting a Readable to an array before processing, collecting all EventEmitter events into a queue before handling, or resolving a DB query into memory before writing. These eliminate backpressure entirely by loading the full dataset into heap first.
When backpressure breaks
Backpressure requires both ends of a channel to honour the signals. Several common architectures break the assumption silently.
EventEmitter-based producers. EventEmitter has no concept of highWaterMark or flow control. If you emit events faster than listeners process them, the call stack grows unbounded. The fix: don't use an EventEmitter when you need a Readable. Convert to a stream, or gate emissions with an explicit queue that the listener drains at a controlled rate.
Database drivers that return arrays. If your database client returns all rows as an array (the query() pattern in most drivers), you've bypassed backpressure before the stream even starts. Use cursor-based APIs: pg's query().cursor(), Prisma's .stream(), or Knex's .stream(). These return Readable streams that respect downstream backpressure.
HTTP response buffering middleware. If upstream middleware buffers res.write() calls, response.write() will always return true even when the client is slow. The underlying socket's backpressure is absorbed by the middleware buffer. Check whether compression middleware, a proxy, or your framework's response object is buffering before assuming backpressure reaches the network layer.
Kafka consumers. Kafka doesn't push messages to consumers; consumers pull. There's no built-in backpressure from broker to consumer. Within the consumer, the correct pattern is: process to completion, then commit offsets. Committing before processing completes means a crash after commit loses those messages permanently. The rate of poll() calls naturally limits throughput; don't commit ahead of processing.
What this looks like in an AI token-streaming Node.js pipeline
Anthropic, OpenAI, and most other inference providers stream responses as Server-Sent Events. A backend proxy receives SSE chunks from the upstream API, parses tokens, and forwards them to the browser. This is exactly the scenario where backpressure matters most.
The http.IncomingMessage you get from the upstream HTTPS request is a Readable stream. The client's ServerResponse is a Writable stream. If the client's connection is slow (mobile on a congested network), response.write() starts returning false. If you ignore that signal, you buffer upstream tokens in Node.js heap. The LLM inference API produces at inference speed; the mobile client drains at network speed. Memory grows until the process crashes.
const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);
// Single-stage: pipe() handles backpressure automatically.
// response.write() returning false will pause llmApiResponse.
app.get('/stream', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const llmApiResponse = await callLlmApi(req.body);
await pipe(llmApiResponse, res);
});
// Multi-stage with parse and filter:
app.get('/stream-filtered', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const llmApiResponse = await callLlmApi(req.body);
await pipe(
llmApiResponse,
new SseParseTransform(), // parse SSE frames
new TokenRedactTransform(), // strip PII
res
);
});pipe() will pause the upstream read when response.write() returns false and resume on drain. The LLM API's data sits in TCP buffers, where OS-level flow control applies pressure back to the API server, rather than accumulating in Node.js heap. The pipeline slows to the rate of the slowest stage (the client's network), rather than buffering at every stage boundary.
The check to run before you deploy
Before shipping a streaming endpoint, validate backpressure manually. Set highWaterMark to something very small (64 bytes works), then connect a consumer that deliberately delays processing with a short sleep between writes. Observe memory over 60 seconds using process.memoryUsage(). If heap grows linearly, backpressure isn't propagating. If it rises briefly then stabilises, it's working correctly.
The Node.js --inspect flag and Chrome DevTools' heap snapshot tool will show which buffers are growing. Look for Buffer or SlowBuffer objects, or the internal chunk arrays on Readable and Writable stream objects. Backpressure isn't a performance optimisation. It's the mechanism that lets a streaming pipeline run indefinitely within a bounded memory footprint. Without it, the only question is when the crash happens, not if.
Frequently asked questions
Related reading
Every Postgres index type, and the bug you get when you pick wrong
B-tree is the default, but it is the wrong choice more often than you expect. This guide covers all six Postgres index types, the bug each was built to prevent, and the gotcha that disables each one silently.
Why boring technology is more important in 2026, not less
AI coding tools made adopting new technology feel fast. That's not the same as making it safe, and the distinction matters more now than it did before.
You adopted Kafka to decouple your services. Here's the coupling that came back.
Event-driven architecture replaces synchronous coupling with three quieter kinds: schema coupling, ordering coupling, and observability coupling. Each one is invisible in a demo and surfaces as a production incident.