PostgreSQL

VACUUM: Garbage Collection and Autovacuum

The orders table is 200 GB, but only 50 GB is real data. The other 150 GB is garbage from millions of UPDATEs over the past months. Queries are slow, disk is full. VACUUM FULL fixes it in 2 hours of downtime. pg_repack fixes it in the same 2 hours without taking the service offline.

**Slack** (2019) discovered the channels table had grown to 250 GB of bloat after autovacuum fell behind during the COVID growth spike. They had to run pg_repack in production in emergency mode. Today dead_pct > 15% triggers an automatic alert.
**Shopify** sets autovacuum_vacuum_scale_factor = 0.01 on the orders table (10M+ rows). With the default 0.2, autovacuum would fire at 2M dead tuples: too late.
**Gitlab** monitors txid wraparound risk through `age(datfrozenxid)` in pg_database. Approaching 1.5B they force a VACUUM FREEZE so the cluster never hits emergency freeze.

Dead Tuples: Where the Garbage Comes From

Every UPDATE in PG creates a new row version and marks the old one as dead (a dead tuple). DELETE also leaves a dead tuple. The row physically stays in place; only xmax is set to the deleting transaction's txid. Accumulating dead tuples causes table bloat: the table file grows, seq scans get slower, and shared_buffers fills with garbage.

A long-lived transaction (a stuck analytical query or an open cursor) holds the xmin of its snapshot. VACUUM cannot remove dead tuples whose xmax is below that xmin. One hour-long transaction can pile up gigabytes of garbage on active tables.

A table has 1M live rows and 500K dead tuples. Why is Seq Scan slower than on a table with just 1M rows and no dead tuples?

VACUUM vs VACUUM FULL: Two Different Tools

Regular VACUUM marks dead tuples as reusable but does not return space to the OS. The physical file size does not shrink. VACUUM FULL rewrites the table entirely into a new file: space is returned, but it needs an exclusive lock (AccessExclusiveLock) for the entire operation. On a 100 GB table that can take hours.

Property	VACUUM	VACUUM FULL
Lock	ShareUpdateExclusiveLock (weak)	AccessExclusiveLock (exclusive)
Readers/writers	Concurrent work allowed	Everyone waits
Return space to OS	No (reuse only)	Yes
Speed	Fast (incremental)	Slow (full rewrite)
Production use	Routine	Only when absolutely necessary

A table is 50 GB, 80% of which is bloat. You run VACUUM. How much space goes back to the OS?

Autovacuum Tuning: Right Thresholds

Autovacuum fires when `n_dead_tup > autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * n_live_tup`. Defaults: threshold=50, scale_factor=0.2. For a 100M-row table that means 20M dead tuples before autovacuum runs: far too rare. On hot tables you should lower scale_factor to 0.01 or even 0.001.

autovacuum_vacuum_cost_delay controls throttling: vacuum sleeps for the configured delay after accumulating cost_limit work points (read a page = 1 point, write = 20). This keeps vacuum from killing I/O. On SSD servers you can drop it to 0-2 ms.

The events table has 10M live rows. With the default scale_factor=0.2, at how many dead tuples will autovacuum fire?

Table Bloat: Diagnosis and Measurement

Bloat is the gap between physical table size and actual data volume. It happens when dead tuples accumulate faster than VACUUM removes them, or when VACUUM cleared the garbage but the file did not shrink. Index bloat is critical: a bloated index reads more slowly and stops fitting in cache.

Index bloat: `SELECT * FROM pgstatindex('idx_orders_user_id')` shows leaf_fragmentation and avg_leaf_density. Density < 50% signals heavy index bloat. REINDEX CONCURRENTLY rebuilds without locking (PG 12+).

pgstattuple shows dead_tuple_percent=5%, but pg_relation_size is double what you expected. How is that possible?

pg_repack: Defragmentation Without Locks

pg_repack rebuilds a table into a new file without an exclusive lock. It creates a copy, applies changes through a trigger, and then atomically swaps. AccessExclusiveLock is held only for the final swap: milliseconds. It is the production-safe replacement for VACUUM FULL.

pg_repack creates a trigger on the source table to capture changes during the copy. So the table must have a PRIMARY KEY (without one pg_repack does not work). Write load increases slightly during the run because of the trigger overhead.

VACUUM FULL is the standard way to deal with bloat in production

VACUUM FULL locks the whole table for hours. Unacceptable in production. For bloat without downtime use pg_repack. VACUUM FULL is only acceptable during planned maintenance windows.

VACUUM FULL holds AccessExclusiveLock: no SELECT, INSERT, UPDATE, or DELETE goes through. On a 50 GB orders table that is 30-60 minutes of full downtime for everything touching the table.

Why does pg_repack require a PRIMARY KEY on the table?

Key Ideas

Dead tuples = physically live rows with xmax set. VACUUM marks them reusable but does not return space to the OS.
Autovacuum threshold = 50 + scale_factor * live_rows. For large tables drop scale_factor to 0.01 per table.
VACUUM FULL = a full lock for hours. pg_repack = a rebuild without downtime via trigger-captured changes.

Вопросы для размышления

What happens to autovacuum behavior if you set autovacuum_vacuum_scale_factor = 0 and autovacuum_vacuum_threshold = 100?
Why can a table be faster after VACUUM FULL than after pg_repack? Hint: row order on disk.
txid wraparound: pg_database.datfrozenxid hits 2^31. What does PG do if VACUUM FREEZE was not run in time?

Связанные уроки

db-09-indexes-btree