PostgreSQL

EXPLAIN ANALYZE: Reading the Query Plan

The query runs for 8 seconds. You add an index, and now it runs for 12 seconds. Why? The answer hides in three lines of EXPLAIN ANALYZE, but only if you know what to look at.

**Notion**, during a PG14 migration, discovered via EXPLAIN ANALYZE that 3 critical dashboard queries were picking Hash Join over Nested Loop because of stale stats. After ANALYZE and a random_page_cost review, total dashboard time dropped from 4.2s to 0.8s.
**Shopify** embedded an EXPLAIN ANALYZE parser into CI/CD: any PR that changes a critical query plan (Index Scan to Seq Scan) is blocked pending DBA review. This catches production regressions early.
**GitLab** publicly documents in its migration guidelines that EXPLAIN ANALYZE output is required for any migration touching tables > 1M rows. It became the standard after several incidents with bad plans.
**Heroku** added automatic query analysis to Dataclips in 2019: if EXPLAIN shows a Seq Scan on a > 100k row table without a LIMIT, the user gets a warning with an index recommendation.

Structure of EXPLAIN Output

`EXPLAIN` shows the query plan: the tree of operations the planner chose for execution. `EXPLAIN ANALYZE` additionally runs the query and reports real metrics. Reading this output is a baseline skill for any backend developer working with PostgreSQL.

Every plan node reports: `cost=startup..total` (estimated cost), `rows` (expected row count), `width` (average row size in bytes). After `ANALYZE` you also get `actual time=first..total` (real time in ms) and `loops` (how many times the node ran).

EXPLAIN without ANALYZE does not run the query, which is safe for SELECT and for DML alike. EXPLAIN ANALYZE does modify data, so wrap UPDATE/DELETE checks in a transaction with ROLLBACK.

What does `loops=5` on an Index Scan node mean in EXPLAIN ANALYZE output?

Scan Types

PostgreSQL picks between three main data access methods. The choice depends on **selectivity** (what fraction of the table needs to be returned) and which indexes exist.

Type	When it wins	Characteristic
Seq Scan	Returns > 5-15% of rows	Reads pages sequentially, I/O is efficient
Index Scan	Highly selective filter (< 1-5%)	Random heap access, expensive at large result sizes
Index Only Scan	All needed columns are in the index	Does not read the heap at all, the fastest

Seq Scan is not always bad. On small tables (< 1000 rows) PostgreSQL almost always picks Seq Scan even when an index exists: the random I/O overhead of the index outweighs the benefit.

The query `SELECT * FROM users WHERE country = 'US'` uses Seq Scan even though an index on country exists. Table: 10M rows, 40% are US. Planner bug?

Bitmap Scan

**Bitmap Heap Scan** is a hybrid method: first a bitmap of pages is built (Bitmap Index Scan), then the heap is read sequentially using that bitmap. It is a compromise between Index Scan (random I/O, one row at a time) and Seq Scan (reads everything).

The key advantage of Bitmap Scan is combining multiple indexes via `BitmapAnd`/`BitmapOr`. Without it PostgreSQL could only use one index to filter. When matches are too many, the bitmap drops to a lossy mode: `Heap Blocks: lossy=320` means `Recheck Cond` has to run row-by-row.

If the plan shows `Recheck Cond` with `Heap Blocks: lossy=N`, that is a sign `work_mem` was not enough for an exact bitmap. Increasing work_mem can flip it to exact mode and cut out the rechecks.

The plan shows `Heap Blocks: lossy=540`. What does it mean and how do you fix it?

Cost Model

PostgreSQL estimates the cost of every plan in abstract units (cost units). The base unit is reading one page sequentially (seq_page_cost = 1.0). All other operations are normalized against it.

Parameter	Default	Meaning
seq_page_cost	1.0	Reading a page during sequential access
random_page_cost	4.0	Random page read (set to 1.1-2.0 for SSD)
cpu_tuple_cost	0.01	Processing one tuple in CPU
cpu_index_tuple_cost	0.005	Processing one index entry
cpu_operator_cost	0.0025	Executing one operator/function

On AWS Aurora and RDS with io1/gp3 storage, `random_page_cost = 1.1-2.0` is the recommendation. At the default 4.0 the planner over-prefers Seq Scan, which is a common cause of "why isn't my index being used?" on cloud installations.

On an SSD cluster the planner keeps picking Seq Scan over Index Scan. What's the first thing to check?

Actual vs Estimated Rows

The single most important signal in EXPLAIN ANALYZE is the gap between `rows=N` (estimate) and `actual rows=M`. If the estimate is off by an order of magnitude, the planner can pick a fundamentally wrong plan: Hash Join instead of Nested Loop or vice versa.

Tools like `explain.depesz.com` or `explain.tensor.ru` accept EXPLAIN ANALYZE output and produce a visualization with highlights on nodes where estimate diverges sharply from actual. That saves time during diagnosis.

High cost in EXPLAIN means the query is slow

Cost is the planner's estimation unit, not milliseconds. Only actual time in EXPLAIN ANALYZE reflects real time

Cost is used only to compare alternative plans. A query with cost=100000 may run in 10ms, and a query with cost=500 may take 5 seconds if estimates were wrong or the cost model is not calibrated to the hardware.

EXPLAIN ANALYZE shows `rows=5` (estimate) vs `actual rows=50000` on an Index Scan. Likely consequences?

Key Ideas

**EXPLAIN ANALYZE** runs the query and reports real time (actual time) vs estimated cost (cost). A gap between them points to a stats problem.
**Seq Scan** wins on large selections (>5-15%); **Index Scan** on high selectivity (<1-5%); **Bitmap Scan** combines multiple indexes via BitmapAnd/BitmapOr.
**random_page_cost** should be lowered to 1.1-2.0 on SSD, otherwise the planner over-prefers Seq Scan.
Main red flag: `rows=5` vs `actual rows=50000`. Such a gap means the planner picked a plan from wrong data.

Вопросы для размышления

When was the last time EXPLAIN ANALYZE ran on your app's most frequent queries, and do the real plans match expectations?
How is random_page_cost configured on your cluster, and does it match the storage type (HDD/SSD/NVMe)?
Does your project have a process to review query plans when schema changes ship?

Связанные уроки

db-11-query-optimization

PostgreSQL

EXPLAIN ANALYZE: Reading the Query Plan

The query runs for 8 seconds. You add an index, and now it runs for 12 seconds. Why? The answer hides in three lines of EXPLAIN ANALYZE, but only if you know what to look at.

**Notion**, during a PG14 migration, discovered via EXPLAIN ANALYZE that 3 critical dashboard queries were picking Hash Join over Nested Loop because of stale stats. After ANALYZE and a random_page_cost review, total dashboard time dropped from 4.2s to 0.8s.
**Shopify** embedded an EXPLAIN ANALYZE parser into CI/CD: any PR that changes a critical query plan (Index Scan to Seq Scan) is blocked pending DBA review. This catches production regressions early.
**GitLab** publicly documents in its migration guidelines that EXPLAIN ANALYZE output is required for any migration touching tables > 1M rows. It became the standard after several incidents with bad plans.
**Heroku** added automatic query analysis to Dataclips in 2019: if EXPLAIN shows a Seq Scan on a > 100k row table without a LIMIT, the user gets a warning with an index recommendation.

Structure of EXPLAIN Output

EXPLAIN without ANALYZE does not run the query, which is safe for SELECT and for DML alike. EXPLAIN ANALYZE does modify data, so wrap UPDATE/DELETE checks in a transaction with ROLLBACK.

What does `loops=5` on an Index Scan node mean in EXPLAIN ANALYZE output?

Scan Types

PostgreSQL picks between three main data access methods. The choice depends on **selectivity** (what fraction of the table needs to be returned) and which indexes exist.

Type	When it wins	Characteristic
Seq Scan	Returns > 5-15% of rows	Reads pages sequentially, I/O is efficient
Index Scan	Highly selective filter (< 1-5%)	Random heap access, expensive at large result sizes
Index Only Scan	All needed columns are in the index	Does not read the heap at all, the fastest

Seq Scan is not always bad. On small tables (< 1000 rows) PostgreSQL almost always picks Seq Scan even when an index exists: the random I/O overhead of the index outweighs the benefit.

The query `SELECT * FROM users WHERE country = 'US'` uses Seq Scan even though an index on country exists. Table: 10M rows, 40% are US. Planner bug?

Bitmap Scan

If the plan shows `Recheck Cond` with `Heap Blocks: lossy=N`, that is a sign `work_mem` was not enough for an exact bitmap. Increasing work_mem can flip it to exact mode and cut out the rechecks.

The plan shows `Heap Blocks: lossy=540`. What does it mean and how do you fix it?

Cost Model

PostgreSQL estimates the cost of every plan in abstract units (cost units). The base unit is reading one page sequentially (seq_page_cost = 1.0). All other operations are normalized against it.

Parameter	Default	Meaning
seq_page_cost	1.0	Reading a page during sequential access
random_page_cost	4.0	Random page read (set to 1.1-2.0 for SSD)
cpu_tuple_cost	0.01	Processing one tuple in CPU
cpu_index_tuple_cost	0.005	Processing one index entry
cpu_operator_cost	0.0025	Executing one operator/function

On an SSD cluster the planner keeps picking Seq Scan over Index Scan. What's the first thing to check?

Actual vs Estimated Rows

High cost in EXPLAIN means the query is slow

Cost is the planner's estimation unit, not milliseconds. Only actual time in EXPLAIN ANALYZE reflects real time

EXPLAIN ANALYZE shows `rows=5` (estimate) vs `actual rows=50000` on an Index Scan. Likely consequences?

Key Ideas

**EXPLAIN ANALYZE** runs the query and reports real time (actual time) vs estimated cost (cost). A gap between them points to a stats problem.
**Seq Scan** wins on large selections (>5-15%); **Index Scan** on high selectivity (<1-5%); **Bitmap Scan** combines multiple indexes via BitmapAnd/BitmapOr.
**random_page_cost** should be lowered to 1.1-2.0 on SSD, otherwise the planner over-prefers Seq Scan.
Main red flag: `rows=5` vs `actual rows=50000`. Such a gap means the planner picked a plan from wrong data.

Вопросы для размышления

When was the last time EXPLAIN ANALYZE ran on your app's most frequent queries, and do the real plans match expectations?
How is random_page_cost configured on your cluster, and does it match the storage type (HDD/SSD/NVMe)?
Does your project have a process to review query plans when schema changes ship?

Связанные уроки

db-11-query-optimization

EXPLAIN ANALYZE: Reading the Query Plan

Structure of EXPLAIN Output

Scan Types

Bitmap Scan

Cost Model

Actual vs Estimated Rows

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

EXPLAIN ANALYZE: Reading the Query Plan

Structure of EXPLAIN Output

Scan Types

Bitmap Scan

Cost Model

Actual vs Estimated Rows

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки