PostgreSQL
Index Maintenance: bloat, reindex, pg_stat
An index on `orders` grew to 40 GB while the table itself takes 8 GB. Status queries got three times slower over six months, and nobody figured out why until someone looked at the 5x bloat ratio.
- **Shopify** regularly finds 300-500% bloat on high-write tables (inventory, orders) after six months without maintenance. REINDEX CONCURRENTLY in a maintenance window drops p99 query latency from 800ms to 200ms.
- **GitLab** found 47 unused indexes totaling 1.2 TB on ci_builds during a DB audit. After dropping them, INSERT throughput rose 23% and autovacuum stopped falling behind.
- **Heroku Postgres** automatically sends an alert when an index has not been used for over 14 days and exceeds 1 GB. Teams save tens of gigabytes per month with no manual audit.
- **Notion** added B-tree deduplication (PG13) on `block_type` and `space_id` indexes and reduced total index size by 35% (800 GB saved across a 12-shard prod cluster).
Index Bloat
**Index bloat** is the inflation of the index file: pages that used to hold live entries are now occupied by dead tuples, but the space is not returned. Every UPDATE or DELETE leaves an old row version in the heap and a matching entry in the index; VACUUM removes them from the heap, but index pages are not automatically compacted.
In practice: a Shopify `orders` table after a year of intensive status updates can carry a 12 GB index on the `status` column, while live data takes 3 GB. A bloat ratio of 4x means every index scan reads three times more pages than necessary.
Bloat is especially destructive for B-tree indexes on frequently updated columns (statuses, counters, timestamps). Indexes on append-only tables (logs, events) barely bloat at all.
Why does VACUUM not eliminate index bloat fully?
Reindex Concurrently
`REINDEX` rebuilds the index from current data, eliminating bloat. The catch: plain `REINDEX INDEX myidx` blocks writes on the table for the entire rebuild, anywhere from minutes to hours on large tables. `REINDEX CONCURRENTLY` (PostgreSQL 12+) works without blocking writes by building a new index alongside the old one.
REINDEX CONCURRENTLY cannot run inside a transaction. On failure it leaves an invalid index with a `_ccnew` suffix, which has to be dropped manually via `DROP INDEX CONCURRENTLY`.
GitLab uses `REINDEX CONCURRENTLY` in dedicated background jobs (monitored via `pg_stat_progress_create_index`) when rolling out migrations on tables with 50M+ rows. Average rebuild time for an index on a 100 GB table: 45 to 90 minutes.
What is the main downside of REINDEX CONCURRENTLY compared to plain REINDEX?
pg_stat_user_indexes
`pg_stat_user_indexes` is a system view that accumulates usage statistics for each index since the last `pg_stat_reset()`. Key fields: `idx_scan` (how many times the index was used in plans), `idx_tup_read` (index entries read), `idx_tup_fetch` (heap rows actually fetched).
Note: stats accumulate from server start or the last `SELECT pg_stat_reset()`. An index with `idx_scan = 0` after a restart is not yet a reason to drop it; you need at least 2 to 4 weeks of representative load.
An index shows idx_scan = 0 in pg_stat_user_indexes. What should you check before dropping it?
Unused Indexes
Unused indexes are hidden resource consumers. Every index slows down INSERT/UPDATE/DELETE (the structure must be updated), eats disk and shared_buffers space, and autovacuum spends time maintaining it. Basecamp found 34 unused indexes on the `events` table totaling 280 GB; after dropping them, INSERT throughput went up 18%.
Before dropping: recreate the index with `CREATE INDEX IF NOT EXISTS ... ;` inside BEGIN/ROLLBACK to validate syntax, check UNIQUE/FK constraints (such indexes are needed even with no scans), and back up the plan via `pg_dump --schema-only`.
Why can't a unique index with idx_scan = 0 be safely dropped based on this stat alone?
B-tree Deduplication
**B-tree deduplication** (PostgreSQL 13+) lets the index store multiple identical keys in a single posting list entry instead of separate entries per row. On low-cardinality indexes (statuses, types, booleans) this shrinks the index 5 to 10x.
Deduplication is automatically disabled for indexes with INCLUDE columns (covering indexes), for indexes on types without a comparison operator, and for unique indexes.
VACUUM FULL removes index bloat with no need for REINDEX
VACUUM FULL rebuilds the heap, but it does not skip indexes either. It recreates them from the heap, which is equivalent to REINDEX
VACUUM FULL does in fact rebuild indexes as part of the operation. That is its hidden behavior. But VACUUM FULL takes an ACCESS EXCLUSIVE lock on the whole table, which is unacceptable in production. REINDEX CONCURRENTLY achieves the same result without the lock.
On which kind of index does B-tree deduplication produce the biggest size win?
Key Ideas
- **Index bloat** accumulates from UPDATE/DELETE: VACUUM marks dead entries but does not compact index pages. A 3-5x bloat ratio after a year on active tables is typical.
- **REINDEX CONCURRENTLY** is the only way to remove bloat without blocking production. It temporarily holds the old and new index together and leaves a `_ccnew` artifact on failure.
- **pg_stat_user_indexes** gives usage data, but needs at least 2 to 4 weeks of typical load. Unique indexes are needed even at idx_scan = 0.
- **B-tree deduplication** (PG13+) shrinks low-cardinality indexes 5 to 10x automatically and is on by default.
Related Topics
Index maintenance overlaps with several key PostgreSQL areas:
- VACUUM and autovacuum — VACUUM is the first line of defense against heap bloat; understanding its limits explains why REINDEX exists
- EXPLAIN ANALYZE — The only reliable way to confirm an index is actually used by the queries you expect before dropping it
- PostgreSQL Index Types — Bloat and deduplication apply to B-tree; GiST/GIN/BRIN maintenance mechanics differ
Вопросы для размышления
- How often should you check bloat on your high-load tables, and do you have automated monitoring of this metric?
- Which indexes in your schema are the most likely drop candidates, and how would you safely validate that hypothesis?
- B-tree deduplication is on by default in PG13+. Is it worth explicitly enabling it on PG12 clusters via REINDEX?