PostgreSQL
EXPLAIN ANALYZE: Reading the Query Plan
The query runs for 8 seconds. You add an index, and now it runs for 12 seconds. Why? The answer hides in three lines of EXPLAIN ANALYZE, but only if you know what to look at.
- **Notion**, during a PG14 migration, discovered via EXPLAIN ANALYZE that 3 critical dashboard queries were picking Hash Join over Nested Loop because of stale stats. After ANALYZE and a random_page_cost review, total dashboard time dropped from 4.2s to 0.8s.
- **Shopify** embedded an EXPLAIN ANALYZE parser into CI/CD: any PR that changes a critical query plan (Index Scan to Seq Scan) is blocked pending DBA review. This catches production regressions early.
- **GitLab** publicly documents in its migration guidelines that EXPLAIN ANALYZE output is required for any migration touching tables > 1M rows. It became the standard after several incidents with bad plans.
- **Heroku** added automatic query analysis to Dataclips in 2019: if EXPLAIN shows a Seq Scan on a > 100k row table without a LIMIT, the user gets a warning with an index recommendation.
Structure of EXPLAIN Output
`EXPLAIN` shows the query plan: the tree of operations the planner chose for execution. `EXPLAIN ANALYZE` additionally runs the query and reports real metrics. Reading this output is a baseline skill for any backend developer working with PostgreSQL.
Every plan node reports: `cost=startup..total` (estimated cost), `rows` (expected row count), `width` (average row size in bytes). After `ANALYZE` you also get `actual time=first..total` (real time in ms) and `loops` (how many times the node ran).
EXPLAIN without ANALYZE does not run the query, which is safe for SELECT and for DML alike. EXPLAIN ANALYZE does modify data, so wrap UPDATE/DELETE checks in a transaction with ROLLBACK.
What does `loops=5` on an Index Scan node mean in EXPLAIN ANALYZE output?
Scan Types
PostgreSQL picks between three main data access methods. The choice depends on **selectivity** (what fraction of the table needs to be returned) and which indexes exist.
| Type | When it wins | Characteristic |
|---|---|---|
| Seq Scan | Returns > 5-15% of rows | Reads pages sequentially, I/O is efficient |
| Index Scan | Highly selective filter (< 1-5%) | Random heap access, expensive at large result sizes |
| Index Only Scan | All needed columns are in the index | Does not read the heap at all, the fastest |
Seq Scan is not always bad. On small tables (< 1000 rows) PostgreSQL almost always picks Seq Scan even when an index exists: the random I/O overhead of the index outweighs the benefit.
The query `SELECT * FROM users WHERE country = 'US'` uses Seq Scan even though an index on country exists. Table: 10M rows, 40% are US. Planner bug?
Bitmap Scan
**Bitmap Heap Scan** is a hybrid method: first a bitmap of pages is built (Bitmap Index Scan), then the heap is read sequentially using that bitmap. It is a compromise between Index Scan (random I/O, one row at a time) and Seq Scan (reads everything).
The key advantage of Bitmap Scan is combining multiple indexes via `BitmapAnd`/`BitmapOr`. Without it PostgreSQL could only use one index to filter. When matches are too many, the bitmap drops to a lossy mode: `Heap Blocks: lossy=320` means `Recheck Cond` has to run row-by-row.
If the plan shows `Recheck Cond` with `Heap Blocks: lossy=N`, that is a sign `work_mem` was not enough for an exact bitmap. Increasing work_mem can flip it to exact mode and cut out the rechecks.
The plan shows `Heap Blocks: lossy=540`. What does it mean and how do you fix it?
Cost Model
PostgreSQL estimates the cost of every plan in abstract units (cost units). The base unit is reading one page sequentially (seq_page_cost = 1.0). All other operations are normalized against it.
| Parameter | Default | Meaning |
|---|---|---|
| seq_page_cost | 1.0 | Reading a page during sequential access |
| random_page_cost | 4.0 | Random page read (set to 1.1-2.0 for SSD) |
| cpu_tuple_cost | 0.01 | Processing one tuple in CPU |
| cpu_index_tuple_cost | 0.005 | Processing one index entry |
| cpu_operator_cost | 0.0025 | Executing one operator/function |
On AWS Aurora and RDS with io1/gp3 storage, `random_page_cost = 1.1-2.0` is the recommendation. At the default 4.0 the planner over-prefers Seq Scan, which is a common cause of "why isn't my index being used?" on cloud installations.
On an SSD cluster the planner keeps picking Seq Scan over Index Scan. What's the first thing to check?
Actual vs Estimated Rows
The single most important signal in EXPLAIN ANALYZE is the gap between `rows=N` (estimate) and `actual rows=M`. If the estimate is off by an order of magnitude, the planner can pick a fundamentally wrong plan: Hash Join instead of Nested Loop or vice versa.
Tools like `explain.depesz.com` or `explain.tensor.ru` accept EXPLAIN ANALYZE output and produce a visualization with highlights on nodes where estimate diverges sharply from actual. That saves time during diagnosis.
High cost in EXPLAIN means the query is slow
Cost is the planner's estimation unit, not milliseconds. Only actual time in EXPLAIN ANALYZE reflects real time
Cost is used only to compare alternative plans. A query with cost=100000 may run in 10ms, and a query with cost=500 may take 5 seconds if estimates were wrong or the cost model is not calibrated to the hardware.
EXPLAIN ANALYZE shows `rows=5` (estimate) vs `actual rows=50000` on an Index Scan. Likely consequences?
Key Ideas
- **EXPLAIN ANALYZE** runs the query and reports real time (actual time) vs estimated cost (cost). A gap between them points to a stats problem.
- **Seq Scan** wins on large selections (>5-15%); **Index Scan** on high selectivity (<1-5%); **Bitmap Scan** combines multiple indexes via BitmapAnd/BitmapOr.
- **random_page_cost** should be lowered to 1.1-2.0 on SSD, otherwise the planner over-prefers Seq Scan.
- Main red flag: `rows=5` vs `actual rows=50000`. Such a gap means the planner picked a plan from wrong data.
Related Topics
EXPLAIN ANALYZE sits at the intersection of several core PostgreSQL topics:
- Planner and Statistics — The actual vs estimated rows gap is explained by how the planner derives estimates from pg_statistic
- JOIN Algorithms — The Hash Join vs Nested Loop vs Merge Join choice is visible in EXPLAIN and driven by the same cost estimates
- Index Maintenance — EXPLAIN ANALYZE is the only reliable way to verify a specific index is actually used by the queries you care about before dropping it
Вопросы для размышления
- When was the last time EXPLAIN ANALYZE ran on your app's most frequent queries, and do the real plans match expectations?
- How is random_page_cost configured on your cluster, and does it match the storage type (HDD/SSD/NVMe)?
- Does your project have a process to review query plans when schema changes ship?