You're probably using the wrong Postgres index on your events table
B-tree, BRIN, and partial indexes — which one your events table actually needs
The events table is the most common slow table in SaaS engineering. By any name — audit_log, activity, webhook_deliveries — it has the same shape: rows arrive in a steady stream, updates are rare, and nearly every query asks for what happened in the last N hours or days. This table grows faster than anything else in the schema. By the time a team revisits its Postgres index choices for the events table, they often have tens of millions of rows and a B-tree on created_at that made sense at launch but is quietly becoming the wrong call.
Here is what B-tree costs at scale, what the alternatives look like, and when to reach for each one.
What a B-tree index costs on a large events table
B-tree is Postgres's general-purpose index. It handles equality lookups and range queries on any column, regardless of whether physical storage order matches the data values. That generality costs storage: it stores a pointer to every indexed row.
On a table with 50 million rows, a B-tree index on a timestamptz column takes roughly 1.1 GB. A useful reference point: if each row is about 200 bytes wide (a timestamp, a user_id, an event_type, and a small JSON payload), the table itself is around 10 GB. The B-tree is 11% of the raw table size, before accounting for index bloat from dead rows or any additional indexes on the same table.
The subtler problem is what happens as the table grows. Postgres's query planner weighs index scan cost against sequential scan cost. A very large index on a wide-row table can push the planner toward sequential scans for moderate result sets, because the random I/O cost of following the B-tree's structure starts to exceed the benefit. That is not a planner bug; it is correct behaviour given the tradeoff. But it means your B-tree on a giant events table may quietly stop being used by the queries you care about.
BRIN: the smallest Postgres index for append-only data
BRIN stands for Block Range INdex. Instead of storing a pointer to every row, it divides the table's physical pages into ranges (128 pages per range by default) and stores only the minimum and maximum value of the indexed column for each range.
The result is dramatic. A BRIN index on a 50M-row timestamptz column typically takes 50 to 200 KB. The equivalent B-tree takes 1.1 GB. For well-ordered, append-only data, BRIN is 500 to 1,000 times smaller. The size difference holds roughly as the table scales.
When a query arrives with a range predicate on the indexed column, Postgres reads the BRIN's compact summary, identifies which block ranges might contain matching rows, then scans only those pages. For time-series data where rows arrive in timestamp order, most block ranges do not overlap, and the scan ends up nearly as selective as a B-tree at a fraction of the storage cost.
The key constraint: BRIN's efficiency depends entirely on physical correlation. If rows arrive in timestamp order, block ranges are tight and non-overlapping. If they do not, block ranges overlap, BRIN loses selectivity, and query performance degrades toward a full sequential scan.
Two ways BRIN silently breaks
BRIN looks like a free win. Here are the two scenarios where it does not deliver, and how to catch them before production does.
Concurrent writers inserting in non-sequential order
If multiple worker processes write to the events table simultaneously, each worker's timestamps may not arrive in the order those workers complete. A job that started at 10:00 might commit at 10:05 while a job that started at 10:03 commits at 10:04. Physical insertion order diverges from timestamp order, and BRIN's range summaries start overlapping.
Check correlation before choosing BRIN:
SELECT correlation
FROM pg_stats
WHERE tablename = 'events'
AND attname = 'created_at';The autosummarize lag
By default, Postgres does not update BRIN range summaries when new rows are inserted. New page ranges are marked unsummarised until the next VACUUM runs. Queries against very recent rows may scan more pages than necessary because the newest ranges have not been summarised yet.
Fix this at index creation:
CREATE INDEX events_created_at_brin
ON events USING BRIN (created_at)
WITH (autosummarize = on);With autosummarize = on, Postgres runs a lightweight background task to summarise new ranges shortly after they fill. It adds a small background cost but keeps the index useful for queries against recent rows.
Partial indexes: the filter your events table is missing
Most event table queries look like this:
SELECT * FROM events
WHERE user_id = $1
AND created_at > now() - interval '30 days'
ORDER BY created_at DESC
LIMIT 100;A B-tree on (user_id, created_at) covers this query, but it indexes every row, including data from 18 months ago that this query will never touch. At scale, that is storage and maintenance overhead on rows that serve no live query.
A partial index scopes the index to a subset of rows matching a predicate:
CREATE INDEX events_recent_user
ON events (user_id, created_at DESC)
WHERE created_at > '2025-11-01';The planner picks this index for recent-data queries and skips historical rows entirely. The index is smaller, maintenance cost is lower, and the query plan is tighter. For a table with 100M rows where 90% are older than the cutoff, the index might cover 10M rows instead of 100M.
The planner only uses the partial index when the query's WHERE clause is compatible with the index predicate. A query asking for the last 30 days will use an index defined with WHERE created_at > '2025-11-01' as long as that cutoff is recent enough. As time passes and the cutoff grows stale, queries for recent events fall outside the indexed range and the planner reverts to the full table.
The operational consequence: the cutoff date does not update itself. You need to replace the index periodically: drop the old one, create a new one with an updated cutoff. A monthly cron job or deploy step handles this. It is index management, not a schema migration.
Putting it together: a decision table
| Your situation | Recommended index | Caveat |
|---|---|---|
| Rows arrive in timestamp order, moderate write concurrency | BRIN on created_at | Check correlation >= 0.9 first |
| High write concurrency, timestamps arrive out of order | B-tree on created_at | BRIN unreliable below 0.9 correlation |
| Most queries filter by user AND recent time window | Partial B-tree on (user_id, created_at) | Replace index monthly as cutoff grows stale |
| Table above ~100 GB, regular archival or deletion needed | Range partitioning on created_at | BRIN per partition still useful within each |
| Mix of broad range scans and per-user recent queries | BRIN on created_at + partial B-tree on (user_id, created_at) | Run correlation check before deploying BRIN |
The combination most production teams land on: a BRIN on created_at for broad time-range scans, and a partial B-tree on (user_id, created_at) for per-user recent history. No full B-tree on created_at alone. That removes the 1+ GB index without giving up query performance on the patterns that actually run.
The natural third path, once an events table grows past 100 GB or starts accumulating data you need to delete efficiently, is range partitioning on created_at. Partitioning lets you drop an old partition instantly, index each partition independently, and gives the query planner a smaller surface to reason about. It is not a replacement for thoughtful index selection; it is what comes after you have exhausted the single-table options.
The ten-second query against pg_stats before switching index types can save you from deploying an index that looks correct in development but degrades under the write concurrency of production.
Frequently asked questions
Related reading
Every Postgres isolation level, and the specific bug it lets through
Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Idempotency keys: the layer you're protecting isn't the one that bites you
An Idempotency-Key header handles one of five layers where duplicates cause harm. Database writes, queue consumers, external API calls, and saga compensation each have failure modes the HTTP key doesn't cover.