pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it
The default maintenance_work_mem is 64MB. An HNSW build over a few hundred thousand vectors needs far more, or it silently falls back to a 10-50x slower build.
The build that took forty minutes instead of forty seconds
A backend engineer adding a retrieval feature to an existing product will usually reach for pgvector first: the embeddings table sits next to the rest of the application data, migrations already work, and there is no second system to operate. For a few hundred thousand rows this works exactly as expected. Then the table crosses a few million rows, someone runs CREATE INDEX ... USING hnsw on a maintenance window, and the build that took forty seconds on the staging table takes forty minutes in production, sometimes longer. Nothing errors. Nothing logs a warning. The index eventually finishes and works fine once it exists. The only symptom is time, and by the time anyone notices, the maintenance window has usually already been blown.
The cause is a specific, well-defined memory limit, and it has nothing to do with pgvector's query performance. Sub-20ms query latency at 95%+ recall is achievable with HNSW even at a million vectors. The problem sits entirely in how the index gets built in the first place.
What HNSW actually needs in memory, and why Postgres will not tell you
HNSW builds a layered graph where each vector gets connected to its nearest neighbours at multiple levels, and that graph has to be constructed in memory, in one pass, for the build to run at a reasonable speed. Postgres allocates that working space out of a setting called maintenance_work_mem, the same setting that governs CREATE INDEX, VACUUM, and a handful of other maintenance operations. Its default is 64MB, a number chosen decades ago for B-tree builds on ordinary column data, not for a graph structure holding hundreds of thousands of 1536-dimension floating-point vectors.
When the graph does not fit in maintenance_work_mem, Postgres does not fail the build. It falls back to a disk-based construction path, spilling intermediate graph state to temporary files and rebuilding sections from disk as it goes. That path is correct. It is also reported to run anywhere from 10 to 50 times slower than the in-memory path, and the gap widens as the vector count grows, because the disk-based path pays random I/O costs the in-memory path never has to.
-- What most teams have, without ever changing it
SHOW maintenance_work_mem;
-- maintenance_work_mem
-- ----------------------
-- 64MB
-- A build over ~1M vectors at 1536 dimensions with this setting
-- will almost certainly spill to disk, silently.The formula: sizing maintenance_work_mem for your vector count
The raw storage cost of a vector column is close to 4 bytes per dimension, so a 1536-dimension embedding, the size OpenAI's common embedding models produce, costs roughly 6 KB per row before any indexing overhead. The HNSW graph itself, once built, typically adds another 1.5 to 2 times the raw column size on top of that, because it has to store the multi-level neighbour links for every vector, not just the vector itself. The number that matters for a build is not the final index size, though, it's how much of that graph has to be held in memory simultaneously during construction, which tracks closely with the same multiplier.
| Vector count | Raw column size | HNSW graph (build-time) | maintenance_work_mem to set |
|---|---|---|---|
| 100,000 | ~0.6 GB | ~1 GB | Default (64MB) is already tight; 512MB is safer |
| 1,000,000 | ~6 GB | ~9-12 GB | 2-4 GB minimum |
| 5,000,000 | ~30 GB | ~45-60 GB | 8-16 GB, more if RAM allows |
| 10,000,000 | ~60 GB | ~90-120 GB | As much as the instance can spare; consider partitioning the build |
The practical move is to raise maintenance_work_mem for the session running the build, not globally, since a permanently high setting applied to every connection risks starving the instance if several maintenance operations happen to run at once.
-- Set per-session, only for the build itself
SET maintenance_work_mem = '8GB';
CREATE INDEX CONCURRENTLY idx_embeddings_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Reset afterwards if the session is reused for anything else
RESET maintenance_work_mem;Catching the fallback before it costs a maintenance window
The disk-based fallback doesn't announce itself in the Postgres logs at default log levels, but it leaves a trace. A build that has spilled to disk generates heavy temporary file activity, which Postgres does record. Checking that activity before and after a build, on a staging table of representative size, is a cheap way to confirm the sizing was right before running the same build against the real table.
-- Before the build: note the baseline
SELECT temp_files, temp_bytes
FROM pg_stat_database
WHERE datname = current_database();
-- Run the CREATE INDEX ... USING hnsw here
-- After the build: compare
SELECT temp_files, temp_bytes
FROM pg_stat_database
WHERE datname = current_database();
-- A large jump in temp_bytes relative to the table's expected
-- in-memory graph size means the build spilled to disk.Running this check against a staging table at ten or twenty percent of the production row count, before scheduling the real maintenance window, catches the problem while it's still a five-minute fix rather than an incident. It also turns the sizing exercise from a one-off guess into something repeatable for the next table that needs the same index.
What happens when you get it wrong anyway
A slow build is the obvious symptom, but there's a second, quieter one. If maintenance_work_mem is tight enough that the build has to make compromises rather than fully spilling to disk, the resulting graph can end up with lower connectivity than intended, which shows up later as degraded recall on queries, not as anything visible during the build itself. A team that only measures index build time and query latency, and never checks recall against a held-out set of known-correct matches, can end up shipping a search feature that quietly misses relevant results, with nothing in the logs pointing at the cause.
“The build finishes either way. Only one version of it built the graph you actually asked for.”
Where pgvector's ceiling actually is
None of this is an argument against pgvector as a default choice. For most retrieval-augmented generation workloads under roughly ten to fifty million vectors, with moderate query volume, pgvector is genuinely the right call: one system to operate, one backup strategy, one set of access controls, and no network hop to a separate service for every query. Dedicated vector databases earn their operational cost at a different point: once vector search is the primary workload rather than a feature living inside a larger application, once hybrid search and heavy metadata filtering are load-bearing requirements rather than occasional conveniences, or once the row count is high enough that even a correctly sized in-memory HNSW build no longer fits on any single instance a team is willing to run.
The mistake worth avoiding is migrating to a dedicated system to solve a memory-sizing problem that a single SET statement would have fixed. Plenty of teams have made that move under the pressure of a blown maintenance window, taken on a new service to operate, and then discovered their actual vector count was well within what pgvector handles once maintenance_work_mem was sized correctly in the first place.
Before the next index build
- Calculate the expected graph size for the target vector count and dimension using the rough multiplier above, before scheduling the maintenance window, not after it runs long.
- Set maintenance_work_mem for the build session specifically, sized to that estimate, and reset it afterwards rather than leaving it elevated globally.
- Use CREATE INDEX CONCURRENTLY so a slow build, if it still happens, does not hold a lock that blocks writes to the table in the meantime.
- Check recall against a held-out set of known-correct matches after any build where memory was tight, not just query latency, since a memory-starved build can degrade result quality without degrading response time.
- Re-run the same sizing exercise before any bulk backfill or re-embedding job, since those rebuild the index at the new row count, not the count the settings were last tuned for.
Frequently asked questions
Related reading
Coinbase's AWS outage lasted 18 hours. The postmortem shows why multi-AZ didn't help.
A single AWS zone failure turned into an 18-hour Coinbase outage. The postmortem reveals two specific ways 'multi-AZ' architecture quietly wasn't, and how to check your own systems for the same gap.
BYO-DSC signing isn't a file upload. Here's what changed in 2021.
Most eSign platforms list “bring your own DSC” as a checkbox feature. Since 2021, the certificate it refers to usually can't be a file at all — and that changes the architecture, not just the paperwork.
Idempotency keys in production: what the tutorials don't cover
Most idempotency key implementations handle the happy path and fail in three specific ways: a race condition between check and claim, a dedup table bottleneck at scale, and key scoping that breaks in fan-out systems.