PostgreSQL
Covering Index and Index-Only Scan
A dashboard shows a customer's orders from the last 30 days - a 500-row query. EXPLAIN: Index Scan, 800 buffers, 12 ms. Adding one line `INCLUDE (amount, status)` to the existing index: Index Only Scan, 6 buffers, 0.5 ms. Same query, same server, no SQL rewrite - 24 times faster. That is a covering index in a single line of DDL.
- **Stripe**: merchant activity dashboards use covering indexes on (merchant_id, created_at DESC) INCLUDE (amount, currency, status) - p99 latency dropped from 200ms to 15ms
- **GitHub**: the 'my PRs' page relies on a covering index on pulls with title/state/updated_at included - it sustains millions of requests per hour without touching the heap
- **Notion**: a workspace index on the page table with INCLUDE for title/icon/updated_at renders the sidebar through a single Index Only Scan, avoiding N+1 fetches
What a covering index is
A regular B-Tree index stores the key and a ctid (pointer to the heap row). For `SELECT amount FROM orders WHERE user_id = 42` PostgreSQL reaches the leaf node in 3-4 disk hops and then reads the heap page to retrieve `amount`. This is an **Index Scan**: the index answers the WHERE clause, but the heap is still required for the returned columns. A **covering index** carries every column the query needs, so the heap can be skipped entirely. The query is served by a single pass over the index without random I/O against the main table.
The term 'covering' means that the index **covers** the query: every column the SELECT returns is physically present in the index leaves. Before PostgreSQL 11 covering was achieved only by appending columns to the key (`CREATE INDEX ... (user_id, amount)`), which inflated the index and forced sorting by the second column. Version 11 introduced the `INCLUDE` clause - those columns are stored in leaves only and do not participate in navigation or ordering.
What is the difference between `INDEX (user_id, amount)` and `INDEX (user_id) INCLUDE (amount)` from PostgreSQL's point of view?
INCLUDE syntax and column strategy
Designing a covering index follows a simple rule: key columns are those used for filtering and ordering; INCLUDE columns are those that are merely returned. For a query running `WHERE user_id = ? AND created_at > ?` and returning `(id, amount, status)`, the key is `(user_id, created_at)` and the INCLUDE is `(id, amount, status)`. The advantage over a five-column composite key: fewer bytes per key, higher branching factor, faster navigation. INCLUDE columns sit only in leaves, not at every level of the tree.
Not every query deserves to be covered. The cost of a covering index: writes become more expensive (INCLUDE data must be updated on UPDATE), the index occupies more disk space and shared_buffers. Covering pays off for hot SELECT queries with a stable set of returned columns. Covering SELECT * is almost always wrong - it drags every TOAST attribute into the index.
Query `SELECT id, total FROM invoices WHERE customer_id = ? ORDER BY issued_at DESC LIMIT 50`. Which index structure is optimal for covering?
Index Only Scan: the mechanics of skipping the heap
**Index Only Scan** is the execution plan in which PostgreSQL returns data straight from the index without reading the heap. It is far cheaper than an ordinary Index Scan: one index page instead of two accesses (index + heap). For a query returning 100 rows with random heap I/O the difference can reach 10-20x. There is a subtlety: PostgreSQL stores MVCC visibility information only in the heap, not in the index. To decide whether a row version is visible to the current transaction, the heap page generally has to be consulted. Index Only Scan is possible only when that check can be skipped.
PostgreSQL keeps every row version separately in the heap (MVCC). The index may reference an outdated version or one that is not yet committed. To determine whether a version is valid for the current transaction the system must in general read the heap page and inspect xmin/xmax. That cancels the covering-index advantage if every row still triggers a heap fetch.
EXPLAIN shows `Index Only Scan ... Heap Fetches: 8000` when returning 10000 rows. What does that mean in practice?
Visibility Map and the role of VACUUM
**Visibility Map** (VM) is a bitmap with one bit per heap page (two in practice: all-visible and all-frozen). The all-visible bit set to 1 means that every version of every row on that page is visible to every existing and future transaction - the xmin/xmax check is unnecessary. Index Only Scan consults the VM before accessing the heap: if the bit is set, the heap is skipped. The bit is reset to 0 on any write to the page (INSERT/UPDATE/DELETE) and is set back only by VACUUM.
VM lives in a separate fork of the table file (`relname_vm`). The size is tiny: 1 bit per 8KB page = 1/65536 of the table size. The VM therefore almost always fits entirely in shared_buffers. The effectiveness of Index Only Scan depends directly on how fresh the VM is, which is maintained by autovacuum. High write throughput without VACUUM = constantly reset bits = a covering index that no longer helps.
Table `events` ingests 5000 INSERTs per minute. A covering index exists, but Heap Fetches remain high. Which change addresses the root cause most effectively?
Trade-offs and measuring the gain
A covering index is a tool with a measurable price. Benefit: a read that used to do two I/Os (index + heap) now does one. For queries with low selectivity (thousands of rows) the latency win reaches 10x. Cost: each table write triggers updates to extra INCLUDE columns in the index; HOT updates become impossible when an INCLUDE column is touched by the UPDATE. On write-heavy systems with 80% INSERT/UPDATE traffic, adding a wide covering index can drop throughput by 15-30%. Rule of thumb: covering belongs on hot SELECT paths confirmed by `pg_stat_statements`.
Metrics for deciding whether to add covering: (1) query frequency - from pg_stat_statements; (2) current cost - EXPLAIN ANALYZE with BUFFERS; (3) write profile on the table - pg_stat_user_tables.n_tup_upd; (4) index size - pg_size_pretty(pg_relation_size). If the query runs a million times per day and saves 5ms each time while the index adds 50ms to one UPDATE in a thousand, the trade is worth it. If it runs once an hour, the index will not pay back.
The more columns in INCLUDE, the more efficient the index
Each INCLUDE column inflates the leaf page and the cost of every UPDATE/INSERT. Only columns genuinely needed by specific hot queries are worth carrying.
A wide covering index = fewer rows per leaf page = more pages to read. At the limit the index becomes larger than the heap. Precise design beats greedy inclusion.
When does a covering index with INCLUDE deliver maximum practical benefit?
Key ideas
- **Covering index**: holds every column the query needs - the SELECT is served by a single pass over the index without heap I/O
- **INCLUDE columns** (since PostgreSQL 11): live only in leaves, do not participate in sorting or navigation - covering without bloating the tree
- **Index Only Scan** works only if the page is marked all-visible in the visibility map; the VM is maintained by VACUUM, and when VACUUM lags the plan degrades to Index Scan + heap fetch
- **Trade-off**: covering speeds up reads 5-10x but increases write amplification - apply it deliberately to hot queries surfaced by pg_stat_statements
Related topics
Covering index sits at the intersection of several PostgreSQL mechanisms:
- B-Tree indexes — Covering is an extension of the ordinary B-Tree via INCLUDE; branching factor and composite-key principles apply directly to its design
- MVCC and VACUUM — Index Only Scan efficiency depends on the visibility map; autovacuum tuning is critical for covering on write-active tables
- Query planner — The choice of Index Only Scan vs Index Scan vs Bitmap Heap Scan is made from statistics; EXPLAIN ANALYZE is the only objective tool to verify the result
Вопросы для размышления
- Why does PostgreSQL not switch to Index Only Scan automatically for every index, given that it is always cheaper?
- Under what conditions can even a well-designed covering index turn out slower than an ordinary Index Scan + heap fetch?
- The visibility map is a tiny structure that is critical for covering. What other similar 'derived' structures could PostgreSQL grow for other optimizations?
Связанные уроки
- pg-12-btree — Covering index is built on top of B-tree structure
- pg-14-partial-expr — Partial and expression indexes complement covering strategy
- pg-18-explain — EXPLAIN ANALYZE shows Index Only Scan for covering indexes
- ds-12-balanced-trees — B-tree is a balanced tree - covering is a layer on top
- db-09-indexes-btree