What is highWaterMark and how does it relate to backpressure?

highWaterMark is the threshold at which a stream signals it's full by returning false from write(). It's not a hard memory cap; it's a signal threshold. Raising it delays the signal, which can increase throughput at the cost of buffering more data before backpressure kicks in. Lowering it makes the pipeline more responsive to slow consumers but increases the frequency of pause/resume cycles.

Does pipe() always handle backpressure automatically?

For a single producer piped to a single consumer, yes. For multi-stage pipelines with Transform streams, use stream.pipeline() instead, which propagates backpressure and errors through every stage and cleans up all streams on failure. Manual pipe() chains in multi-stage pipelines often leak file descriptors on errors and don't propagate backpressure reliably through async Transform stages.

Why do async iterators create backpressure automatically?

When you use for await...of over a Readable and await processing within the loop, the loop doesn't request the next chunk until the current iteration's await resolves. This means the consumer only pulls as fast as it can process, which is exactly what backpressure is: letting the consumer's rate govern the producer's rate.

Does this apply to HTTP response streaming in frameworks like Express or Fastify?

Yes. Both expose a ServerResponse object, which is a Writable stream. You can pipe() a Readable directly to res, and the framework will respect the backpressure signals. The risk is middleware that buffers the response: compression middleware, logging middleware that accumulates the body, or proxy layers that buffer before forwarding. Check whether anything between your stream and the socket absorbs the backpressure signal.

EngineeringJun 9, 20266 min readReviewed Jun 9, 2026

Why your Node.js streaming pipeline crashes under load (and how backpressure fixes it)

The buffer trap, how to escape it, and the three patterns that implement backpressure correctly

By FlowVerify Editorial Team

Key takeaways

When a consumer falls behind its producer, data accumulates in Node.js heap buffers — the crash shows up as OOM, not as a streaming error.
Adding a bigger buffer delays the crash and amplifies it: the backlogged data creates a burst that overwhelms downstream when it finally drains.
Node.js backpressure relies on write() returning false and the drain event — ignoring write()'s return value is the most common implementation mistake.
Multi-stage pipelines need backpressure to propagate upstream through every stage; use stream.pipeline() rather than manual pipe() chains.
Async iteration with for await...of creates natural backpressure without explicit pause/resume logic.
AI token-streaming pipelines are especially vulnerable because LLMs produce fast and mobile clients consume slowly — pipe(response) is the correct fix.

Every Node.js streaming pipeline has a hidden failure mode. When the consumer falls behind the producer, data accumulates in heap buffers. There is no warning. Memory grows silently until the process exceeds V8's heap limit and crashes with an OOM error. The log shows the heap was full. It doesn't say why it filled.

The quiet failure mode: when producers outrun consumers

The OOM crash is the symptom. The cause is a rate differential. The producer (reading a large file, streaming database rows, or forwarding tokens from an LLM API) generates chunks faster than the consumer can process them. The consumer might be writing to a compressed stream, proxying to a slow mobile client, or running an expensive transform. The gap between generation speed and processing speed accumulates in memory.

This failure mode appears in more places than engineers expect: CSV exports with slow database cursors, Kafka consumers whose processing step takes longer than the poll interval, AI token-streaming proxies where the LLM generates at inference speed and the client reads at mobile-network speed, log-tail endpoints that aggregate before forwarding. The crash arrives after minutes or hours of gradual growth. Engineers go looking for a memory leak. They add --max-old-space-size and the crash reappears later, with more memory used.

The root cause, a rate differential between producer and consumer, stays hidden because nothing in the crash log names it. This is the backpressure problem.

Why developers reach for buffers, and why that makes it worse

The intuitive fix is a buffer. If the consumer cannot keep up right now, give it more time by storing more data. Increase highWaterMark. Add an in-process queue. Allocate more heap.

This delays the crash. It doesn't prevent it.

A buffer has a finite capacity. If the producer consistently outpaces the consumer (not just in bursts, but in a sustained way), any buffer will eventually fill. You've bought time at the cost of memory. The crash arrives later with a much larger heap footprint.

The buffer trap is convincing in demos. Test environments usually run on fast machines on the same network where the consumer keeps up, the buffer never fills. The problem only surfaces under realistic conditions: slow clients, high concurrency, or large payloads under sustained load.

How backpressure works in Node.js streaming pipelines

Node.js streams implement backpressure through two signals. A Writable stream's write() method returns false when its internal buffer reaches the highWaterMark. This means: stop sending. The stream emits a drain event when it has flushed its buffer and is ready to accept more. A Readable stream can be paused, stopping chunk delivery until resumed.

The contract is straightforward: if write() returns false, pause the Readable. Wait for drain. Resume. Here is what that looks like without using pipe():

backpressure.js

// Wrong: ignores the backpressure signal
readable.on('data', chunk => {
  writable.write(chunk); // return value silently discarded
});

// Right: honour the signal
readable.on('data', chunk => {
  const ok = writable.write(chunk);
  if (!ok) {
    readable.pause();
    writable.once('drain', () => readable.resume());
  }
});

// Best for a single producer -> consumer: pipe() does the above automatically
readable.pipe(writable);

The pipe() method implements the pause/drain/resume contract automatically. Ignoring write()'s return value while using an event listener is the single most common backpressure mistake in Node.js code. The stream signals correctly; the code just doesn't listen.

The propagation problem: backpressure must travel upstream

A single producer piping to a single consumer is manageable. A multi-stage pipeline is harder.

Consider a realistic pipeline: database cursor to JSON transform to gzip compress to HTTP response. If the HTTP response is slow, the gzip stage needs to slow down. The gzip stage slowing down needs to signal the JSON transform to slow down. The JSON transform slowing down needs to signal the database cursor to slow down. Backpressure has to propagate all the way back to the source.

Manual pipe() chains don't reliably handle this across multiple stages. The correct approach is stream.pipeline(), available since Node.js 10:

pipeline.js

const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);

// Backpressure propagates through every stage automatically.
// Error handling and cleanup also propagate -- no fd leaks on failure.
await pipe(
  dbCursor,
  new JsonTransform(),
  zlib.createGzip(),
  response
);

pipeline() wires up error propagation and backpressure across every stage. It also cleans up all streams on error, which is where manual pipe() chains often leak file descriptors or sockets.

Transform streams that do async work introduce a subtlety. The _transform method receives a callback; calling that callback signals readiness for the next chunk. Calling the callback before async work completes (to get out of the way faster) breaks backpressure: the stream requests the next chunk before the current one is processed. Call the callback only after push() and only after all async work for that chunk is done.

Three patterns that implement backpressure correctly

Each pattern fits a different context. Choose based on what you're building, not by what seems simplest.

Pattern	Best for	What it handles automatically
stream.pipeline()	Multi-stage transforms with cleanup requirements	Backpressure propagation, error handling, stream cleanup
Manual pause/resume + drain	Push-based protocols (WebSocket, SSE) reading from a Readable	Gives direct control when pipe() does not fit
Async iteration (for await...of)	Simple async transforms without explicit stream wiring	Natural backpressure via await; no pause/resume logic needed

Backpressure patterns by context

The pattern to avoid: converting a Readable to an array before processing, collecting all EventEmitter events into a queue before handling, or resolving a DB query into memory before writing. These eliminate backpressure entirely by loading the full dataset into heap first.

When backpressure breaks

Backpressure requires both ends of a channel to honour the signals. Several common architectures break the assumption silently.

EventEmitter-based producers. EventEmitter has no concept of highWaterMark or flow control. If you emit events faster than listeners process them, the call stack grows unbounded. The fix: don't use an EventEmitter when you need a Readable. Convert to a stream, or gate emissions with an explicit queue that the listener drains at a controlled rate.

Database drivers that return arrays. If your database client returns all rows as an array (the query() pattern in most drivers), you've bypassed backpressure before the stream even starts. Use cursor-based APIs: pg's query().cursor(), Prisma's .stream(), or Knex's .stream(). These return Readable streams that respect downstream backpressure.

HTTP response buffering middleware. If upstream middleware buffers res.write() calls, response.write() will always return true even when the client is slow. The underlying socket's backpressure is absorbed by the middleware buffer. Check whether compression middleware, a proxy, or your framework's response object is buffering before assuming backpressure reaches the network layer.

Kafka consumers. Kafka doesn't push messages to consumers; consumers pull. There's no built-in backpressure from broker to consumer. Within the consumer, the correct pattern is: process to completion, then commit offsets. Committing before processing completes means a crash after commit loses those messages permanently. The rate of poll() calls naturally limits throughput; don't commit ahead of processing.

What this looks like in an AI token-streaming Node.js pipeline

Anthropic, OpenAI, and most other inference providers stream responses as Server-Sent Events. A backend proxy receives SSE chunks from the upstream API, parses tokens, and forwards them to the browser. This is exactly the scenario where backpressure matters most.

The http.IncomingMessage you get from the upstream HTTPS request is a Readable stream. The client's ServerResponse is a Writable stream. If the client's connection is slow (mobile on a congested network), response.write() starts returning false. If you ignore that signal, you buffer upstream tokens in Node.js heap. The LLM inference API produces at inference speed; the mobile client drains at network speed. Memory grows until the process crashes.

llm-stream-proxy.js

const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);

// Single-stage: pipe() handles backpressure automatically.
// response.write() returning false will pause llmApiResponse.
app.get('/stream', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const llmApiResponse = await callLlmApi(req.body);
  await pipe(llmApiResponse, res);
});

// Multi-stage with parse and filter:
app.get('/stream-filtered', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const llmApiResponse = await callLlmApi(req.body);
  await pipe(
    llmApiResponse,
    new SseParseTransform(),    // parse SSE frames
    new TokenRedactTransform(), // strip PII
    res
  );
});

pipe() will pause the upstream read when response.write() returns false and resume on drain. The LLM API's data sits in TCP buffers, where OS-level flow control applies pressure back to the API server, rather than accumulating in Node.js heap. The pipeline slows to the rate of the slowest stage (the client's network), rather than buffering at every stage boundary.

The check to run before you deploy

Before shipping a streaming endpoint, validate backpressure manually. Set highWaterMark to something very small (64 bytes works), then connect a consumer that deliberately delays processing with a short sleep between writes. Observe memory over 60 seconds using process.memoryUsage(). If heap grows linearly, backpressure isn't propagating. If it rises briefly then stabilises, it's working correctly.

The Node.js --inspect flag and Chrome DevTools' heap snapshot tool will show which buffers are growing. Look for Buffer or SlowBuffer objects, or the internal chunk arrays on Readable and Writable stream objects. Backpressure isn't a performance optimisation. It's the mechanism that lets a streaming pipeline run indefinitely within a bounded memory footprint. Without it, the only question is when the crash happens, not if.

Frequently asked questions

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Reddit moved 500+ Kafka brokers and a petabyte of live data from EC2 to Kubernetes with zero downtime. The three techniques behind it aren't specific to Kafka.

Jul 8, 2026Read full article →

EngineeringJun 9, 20266 min readReviewed Jun 9, 2026

Why your Node.js streaming pipeline crashes under load (and how backpressure fixes it)

The buffer trap, how to escape it, and the three patterns that implement backpressure correctly

By FlowVerify Editorial Team

Key takeaways

When a consumer falls behind its producer, data accumulates in Node.js heap buffers — the crash shows up as OOM, not as a streaming error.
Adding a bigger buffer delays the crash and amplifies it: the backlogged data creates a burst that overwhelms downstream when it finally drains.
Node.js backpressure relies on write() returning false and the drain event — ignoring write()'s return value is the most common implementation mistake.
Multi-stage pipelines need backpressure to propagate upstream through every stage; use stream.pipeline() rather than manual pipe() chains.
Async iteration with for await...of creates natural backpressure without explicit pause/resume logic.
AI token-streaming pipelines are especially vulnerable because LLMs produce fast and mobile clients consume slowly — pipe(response) is the correct fix.

The quiet failure mode: when producers outrun consumers

The root cause, a rate differential between producer and consumer, stays hidden because nothing in the crash log names it. This is the backpressure problem.

Why developers reach for buffers, and why that makes it worse

The intuitive fix is a buffer. If the consumer cannot keep up right now, give it more time by storing more data. Increase highWaterMark. Add an in-process queue. Allocate more heap.

This delays the crash. It doesn't prevent it.

How backpressure works in Node.js streaming pipelines

The contract is straightforward: if write() returns false, pause the Readable. Wait for drain. Resume. Here is what that looks like without using pipe():

backpressure.js

// Wrong: ignores the backpressure signal
readable.on('data', chunk => {
  writable.write(chunk); // return value silently discarded
});

// Right: honour the signal
readable.on('data', chunk => {
  const ok = writable.write(chunk);
  if (!ok) {
    readable.pause();
    writable.once('drain', () => readable.resume());
  }
});

// Best for a single producer -> consumer: pipe() does the above automatically
readable.pipe(writable);

The propagation problem: backpressure must travel upstream

A single producer piping to a single consumer is manageable. A multi-stage pipeline is harder.

Manual pipe() chains don't reliably handle this across multiple stages. The correct approach is stream.pipeline(), available since Node.js 10:

pipeline.js

const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);

// Backpressure propagates through every stage automatically.
// Error handling and cleanup also propagate -- no fd leaks on failure.
await pipe(
  dbCursor,
  new JsonTransform(),
  zlib.createGzip(),
  response
);

pipeline() wires up error propagation and backpressure across every stage. It also cleans up all streams on error, which is where manual pipe() chains often leak file descriptors or sockets.

Three patterns that implement backpressure correctly

Each pattern fits a different context. Choose based on what you're building, not by what seems simplest.

Pattern	Best for	What it handles automatically
stream.pipeline()	Multi-stage transforms with cleanup requirements	Backpressure propagation, error handling, stream cleanup
Manual pause/resume + drain	Push-based protocols (WebSocket, SSE) reading from a Readable	Gives direct control when pipe() does not fit
Async iteration (for await...of)	Simple async transforms without explicit stream wiring	Natural backpressure via await; no pause/resume logic needed

Backpressure patterns by context

When backpressure breaks

Backpressure requires both ends of a channel to honour the signals. Several common architectures break the assumption silently.

What this looks like in an AI token-streaming Node.js pipeline

llm-stream-proxy.js

const { pipeline } = require('node:stream');
const { promisify } = require('node:util');
const pipe = promisify(pipeline);

// Single-stage: pipe() handles backpressure automatically.
// response.write() returning false will pause llmApiResponse.
app.get('/stream', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const llmApiResponse = await callLlmApi(req.body);
  await pipe(llmApiResponse, res);
});

// Multi-stage with parse and filter:
app.get('/stream-filtered', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const llmApiResponse = await callLlmApi(req.body);
  await pipe(
    llmApiResponse,
    new SseParseTransform(),    // parse SSE frames
    new TokenRedactTransform(), // strip PII
    res
  );
});

Why your Node.js streaming pipeline crashes under load (and how backpressure fixes it)

The quiet failure mode: when producers outrun consumers

Why developers reach for buffers, and why that makes it worse

How backpressure works in Node.js streaming pipelines

The propagation problem: backpressure must travel upstream

Three patterns that implement backpressure correctly

When backpressure breaks

What this looks like in an AI token-streaming Node.js pipeline

The check to run before you deploy

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Why your Node.js streaming pipeline crashes under load (and how backpressure fixes it)

The quiet failure mode: when producers outrun consumers

Why developers reach for buffers, and why that makes it worse

How backpressure works in Node.js streaming pipelines

The propagation problem: backpressure must travel upstream

Three patterns that implement backpressure correctly

When backpressure breaks

What this looks like in an AI token-streaming Node.js pipeline

The check to run before you deploy

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.