Postgres partial indexes: the read gains, the write trap, and the covering pattern
What most tutorials skip — and what to check before you deploy
Postgres partial indexes can drop a filtered query from 900ms to 3ms on a 100-million-row table. That outcome is real and well-documented. What is less documented is what happens six weeks later when write latency at p99 has climbed and nobody has touched the schema. The partial index is often the reason.
The mechanism is Heap Only Tuple updates, a Postgres storage optimisation that lets the engine skip index maintenance entirely on writes where indexed columns have not changed. When a partial index references columns that are updated frequently, HOT stops working. The query plan looks the same. EXPLAIN shows the index being used for reads. But update throughput has dropped and none of the usual diagnostics point at the index.
This article covers the complete picture: when the planner actually uses a partial index and when it refuses, how HOT updates interact with partial index predicates, the soft-delete pattern and its write-side traps, and the covering partial index (combining WHERE and INCLUDE) that often eliminates the trade-off. Diagnostic queries are included throughout.
Postgres partial indexes and the planner's selectivity threshold
A partial index is a B-tree built over a subset of table rows. An orders table with 100 million rows and 2 million active ones can carry a partial index that covers only those 2 million. That index is roughly 50 times smaller than a full index on the same column. It fits in shared_buffers more readily, scans faster, and costs less to maintain.
The planner decides whether to use the index by comparing the cost of an index scan against a sequential scan. Index scans pay per-row random I/O costs. Sequential scans amortise I/O across pages. At low selectivity (fewer than roughly 10 to 30 per cent of rows), the index wins. Past that threshold, Postgres typically prefers the sequential scan even when the index exists. The exact cut-off depends on seq_page_cost and random_page_cost settings; the defaults (1.0 and 4.0) assume random I/O is four times more expensive than sequential, which holds reasonably on spinning disks but overstates the penalty on SSDs. Tuning random_page_cost down to 1.1 for SSD-backed storage nudges the planner toward index scans in borderline cases.
For partial indexes, the relevant selectivity denominator is the full table row count, not the indexed row count. A query returning 2 million of 100 million rows has 2 per cent selectivity and the planner uses the index. If the active fraction later grows to 35 million rows (35 per cent selectivity), the planner may switch to a sequential scan on the whole table rather than using the partial index, even though the partial index covers exactly those rows. The transition is often silent; the query plan changes without any schema change.
Two other things stop the planner from using a partial index. The first is syntactic mismatch. An index defined `WHERE status = 'open'` will not be used by a query with `WHERE status IN ('open', 'pending')`. Postgres does not perform logical inference on predicates; the query predicate must be a syntactic subset of the index predicate. An index defined `WHERE deleted_at IS NULL AND status = 'open'` will only match queries that include both conditions in exactly that form. Test with EXPLAIN before assuming the index applies.
The second is stale statistics. Autovacuum runs ANALYZE periodically, but after a large bulk import it may not have run yet. The planner uses column statistics from the last ANALYZE to estimate how many rows the predicate will return. If those statistics describe a distribution that existed before the import, the planner's selectivity estimate is wrong and it may skip the partial index entirely.
The gap between estimated and actual rows in EXPLAIN (ANALYZE, BUFFERS) output is the signal. A large discrepancy means the planner's cost model doesn't match reality. The fix is a manual ANALYZE:
SELECT relname,
last_autoanalyze,
last_analyze,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';If last_autoanalyze is weeks old after significant data changes, run `ANALYZE orders;` before concluding the partial index is broken. It takes seconds on most tables and immediately refreshes the planner's model.
The HOT update mechanism — and what breaks it
Every Postgres update writes a new heap tuple. The old tuple is marked dead and eventually reclaimed by vacuum. Any index entries pointing to the old tuple must also be updated, potentially adding multiple index writes per row update if the table has several indexes.
Heap Only Tuple updates take a shorter path. When an update does not change any indexed column and there is free space on the same heap page, Postgres writes the new tuple and chains it to the old one with an in-page pointer. Index entries are not written or updated. The old index entry still points to the heap page; Postgres follows the chain to find the current tuple. This saves index I/O on every qualifying write. The cumulative throughput difference on a write-heavy table is substantial.
Three conditions break HOT. The first is straightforward: if the update changes a column that is in any index key, that index entry must change. HOT is disabled. The second is specific to partial indexes: if the update changes a column referenced in the partial index's WHERE predicate, Postgres disables HOT for that update because the row's membership in the indexed subset may have changed. The row might be entering or leaving the index. Postgres does not verify whether the value change actually affects membership; it conservatively disables HOT whenever a predicate column is touched. The third is physical: no free space on the same heap page forces the new tuple to a different page, breaking the chain regardless of index status.
The predicate column condition is the one that surprises teams. Research by Nikolay Samokhvalov (PostgresAI) measured update throughput drops of roughly 40 per cent on tables with partial indexes referencing frequently-updated columns, compared to the same workload against the table with no partial index on the predicate column. The penalty scales with write rate. A column that changes on every write pays index maintenance tax on every write.
This does not show up in EXPLAIN. The query plan for an UPDATE looks the same whether HOT updates are happening or not. The diagnostic is in pg_stat_user_tables:
SELECT relname,
n_tup_upd,
n_tup_hot_upd,
round(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 1) AS hot_pct
FROM pg_stat_user_tables
WHERE relname = 'orders';A table where indexed columns are stable should show a HOT rate of 80 to 95 per cent. A reading of 20 to 40 per cent after adding a partial index indicates the index is breaking HOT on a significant fraction of writes. The counters accumulate since the last stats reset, so compare before and after index creation rather than reading the current value in isolation.
The soft-delete pattern, looked at carefully
The most-cited partial index example is the soft-delete filter: index only active rows, skip the historical bulk:
CREATE INDEX idx_orders_active
ON orders (customer_id)
WHERE deleted_at IS NULL;For a table where 98 per cent of rows are soft-deleted, this is a strong read-side choice. Active-row queries hit a small, cache-friendly index. The storage footprint is a fraction of a full index on customer_id. For the standard 'show me current records for this customer' query, it works exactly as advertised.
On the write side, the outcome depends on which column is in the index key. When you soft-delete a row by setting deleted_at to a timestamp, the row leaves the partial index. Postgres must remove the index entry. HOT is blocked for that update and one index write occurs. Since soft-deletion happens once per row lifetime, this is acceptable.
The problem appears when the index key column is one you update on every operation:
-- Every status transition on an active row requires an index write.
-- HOT is disabled for each of these updates.
CREATE INDEX idx_orders_active_status
ON orders (status)
WHERE deleted_at IS NULL;Every time an active order moves from 'pending' to 'processing', the index entry for that row must be rewritten. No HOT. At high order volume with frequent status transitions, this is index maintenance on every state change. The 40 per cent throughput figure applies directly here, and the effect scales with how often status changes per second.
One design that avoids this: push the filtering predicate down into the WHERE clause and index a stable identifier instead. If your primary read pattern queries a specific customer's open orders, you want the query predicate columns in the index definition, not the index key:
-- Index customer_id (stable) with the status filter in the predicate.
-- HOT is preserved for updates that don't touch deleted_at or customer_id.
-- Soft-deleting removes the row from the index (one index write, once per lifetime).
CREATE INDEX idx_orders_customer_open
ON orders (customer_id)
WHERE deleted_at IS NULL AND status = 'open';This index is smaller still (scoped to open rows only) and the key column (customer_id) rarely changes. Status transitions out of 'open' remove rows from the index at transition time, but updates to other columns on open rows preserve HOT. The trade-off: this index only covers 'open' queries. If you query by 'pending' and 'in_review' status separately, you need separate narrow indexes for each, which is usually the right answer at scale.
The general rule: columns in the index key should be queried but rarely updated. Columns you filter on but update frequently belong in the WHERE predicate, not the key. Each status value that transitions out of scope costs one index write per transition rather than one index write per update.
Covering partial indexes: combining WHERE and INCLUDE
Postgres 11 added the INCLUDE clause to index definitions. Columns listed in INCLUDE are stored in the index leaf pages but are not part of the B-tree key. They cannot be used for sorting or range lookups, but they allow a query to be answered entirely from the index without reading the heap, enabling an index-only scan.
Combined with a WHERE clause, this creates a covering partial index: small because it is scoped to active rows, and fast because it eliminates heap fetches for covered queries:
CREATE INDEX idx_orders_active_covering
ON orders (customer_id, created_at)
INCLUDE (total, status)
WHERE deleted_at IS NULL;A query like `SELECT customer_id, created_at, total, status FROM orders WHERE customer_id = $1 AND deleted_at IS NULL ORDER BY created_at DESC LIMIT 20` can now be answered entirely from this index. Postgres reads the index leaf pages and returns the result without touching the orders heap at all. On a large table where heap pages are cold (not resident in shared_buffers), the saving is significant. Each heap fetch is a random I/O operation; index-only scans eliminate it entirely.
The write-side rules are unchanged. Updates to the key columns (customer_id, created_at) break HOT exactly as they would for a regular partial index. Updates to INCLUDE'd columns (total, status) also require writing a new index leaf entry. INCLUDE does not make those columns free to update. The difference from key columns is that INCLUDE'd data lives only in leaf nodes; it does not create extra internal B-tree nodes. The index stays compact despite carrying extra data per row.
The one operational step to verify after creating a covering partial index is the visibility map. Index-only scans use the heap's visibility map to confirm that indexed rows are visible to the current transaction without reading the heap directly. If the visibility map is out of date (vacuum must run to keep it current), Postgres falls back to a heap fetch per row to verify visibility. EXPLAIN (ANALYZE, BUFFERS) will show this as non-zero 'Heap Fetches'. Running `VACUUM orders;` updates the visibility map and restores true index-only scan behaviour.
| Index type | Index size | Read path | Write cost when key col changes | Best for |
|---|---|---|---|---|
| Full index on (customer_id) | Large (all rows) | Index scan + heap fetch | HOT blocked for customer_id updates | General lookups; no clear row-subset filter |
| Partial: (customer_id) WHERE deleted_at IS NULL | Small (active rows only) | Index scan + heap fetch | HOT preserved if customer_id is stable | Active-row queries when >70% rows are filtered out |
| Covering partial: (customer_id, created_at) INCLUDE (total, status) WHERE deleted_at IS NULL | Small to medium | Index-only scan, no heap fetch after VACUUM | HOT preserved if key cols are stable; INCLUDE cols require leaf write | High-traffic reads projecting specific columns on active rows |
Diagnosing whether your index is doing its job
Creating an index does not guarantee the planner uses it. Two queries cover most diagnostic needs:
-- Is the index being used at all?
SELECT indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;
-- For a covering partial index: are index-only scans happening?
SELECT indexrelname,
idx_scan,
idx_blks_read,
idx_blks_hit
FROM pg_statio_user_indexes
WHERE relname = 'orders';After real query load, idx_scan should be nonzero for an index that is being used. If it stays at zero, the index is not being selected by the planner. Use EXPLAIN on the relevant query and look at whether the predicate is matching (syntactic subset check), whether statistics are stale (compare estimated vs. actual row counts), and whether selectivity has shifted above the threshold.
For a covering partial index, EXPLAIN (ANALYZE, BUFFERS) should show 'Index Only Scan' on the index with 'Heap Fetches: 0'. If the scan type is 'Index Only Scan' but Heap Fetches is non-zero, run VACUUM on the table and re-test. If the plan shows 'Index Scan' (not index-only), the INCLUDE columns are not eliminating heap fetches. Either the query projects columns not present in the index, or the visibility map issue applies.
“The index that makes the dashboard fast should not be the reason the write pipeline degrades.”
A checklist before deploying a partial index
Before creating the index:
- Confirm the predicate filters more than 70 per cent of rows. Below that, the planner increasingly prefers sequential scans and the size advantage matters less.
- Test with EXPLAIN that the target query's WHERE clause is a syntactic subset of the planned index predicate. Do not assume; verify.
- Check whether the index key columns are updated frequently. Run the HOT rate query on the table to establish a baseline before adding the index.
- If the query projects several columns beyond the key, consider adding INCLUDE for those columns to enable index-only scans. Keep INCLUDE'd columns to those with low update frequency.
After creating the index:
- Run ANALYZE on the table to refresh the planner's statistics.
- Verify idx_scan is nonzero after representative query load via pg_stat_user_indexes.
- Re-check the HOT rate after a week of write load. A meaningful drop indicates the index is breaking HOT on a significant fraction of updates.
- For covering partial indexes: run VACUUM, then confirm EXPLAIN (ANALYZE) shows 'Index Only Scan' with 'Heap Fetches: 0'.
Partial indexes are one of the more precise tools in Postgres performance work. The read-side story (small, cache-friendly, dramatically faster on filtered queries) is well understood. The write-side story requires checking HOT rates before and after, keeping index key columns stable, and running ANALYZE after bulk data changes. The covering partial index brings the read path as close to optimal as a B-tree allows: scoped to the rows that matter, carrying the columns the query needs, no heap fetch required. Get the key column selection right and the write cost is manageable. Get it wrong and the index that fixed the slow query is the same one quietly degrading write throughput.
Frequently asked questions
Related reading
Every Postgres isolation level, and the specific bug it lets through
Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Rate limiting in production: why the algorithm you chose is probably wrong for your workload
Most rate limiting failures aren't implementation errors. They come from picking an algorithm whose properties don't match the actual traffic shape. Here's a workload-first framework for making the right choice.