Databases

EXPLAIN ANALYZE: Reading the Query Plan

A query that took one second on 10,000 test rows crawls for 45 seconds in production with 50 million rows. The problem is not the code - the planner chose the wrong plan due to stale statistics. One `EXPLAIN ANALYZE` reveals the cause in 30 seconds.

  • **Shopify** runs automatic plan analysis for every slow query - EXPLAIN ANALYZE fires automatically for queries over 100 ms and the result is stored in their monitoring system.
  • **GitHub** found that their largest JOIN degraded after a data migration: statistics were stale and the planner chose Nested Loop instead of Hash Join. One ANALYZE restored performance.
  • **pg_stat_statements** extension is standard in any serious production setup - it surfaces the top-N queries by total_time and helps prioritize optimization work.

Anatomy of EXPLAIN ANALYZE

PostgreSQL does not hide how it thinks: `EXPLAIN ANALYZE` returns the full execution plan tree with estimated and actual costs, row counts, and per-node timing. It is an X-ray of the query, readable by anyone who knows the format.

The difference between `EXPLAIN` and `EXPLAIN ANALYZE`: the former builds the plan without executing (cost = planner estimate), the latter actually runs the query and reports actual rows/time. For SELECT this is safe; for INSERT/UPDATE/DELETE use `BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;`.

The planner shows `cost=145..892 rows=1200` but `actual rows=45`. What does this indicate?

Seq Scan, Index Scan, Bitmap Scan

Three scan types represent three different data access strategies. PostgreSQL chooses among them based on selectivity - what fraction of the table the filter touches. A counterintuitive fact: Seq Scan is sometimes faster than Index Scan - especially when 30-40% of rows are needed, because sequential disk reads outperform random reads.

0

1

Sign In

**Seq Scan** - reads the table page by page, best for large selections (>5-10% of rows). **Index Scan** - traverses the B-tree then does a heap fetch per row (random I/O), ideal for <1-2% of rows. **Bitmap Index Scan** - builds a bitmap of matching pages in memory, then reads the heap in chunks - a compromise for 1-10% of rows or multiple indexes combined via AND/OR.

Table `events` has 10M rows. A query selects events from the last hour (0.01% of rows). The planner chose Seq Scan. What should be done?

Nested Loop, Hash Join, Merge Join

JOIN is the most expensive operation in relational queries. PostgreSQL implements three algorithms, and the choice depends on table sizes, index availability, and `work_mem`. Choosing the wrong algorithm is a common cause of slow JOINs in production.

**Nested Loop** - O(N*M), good when the inner relation is small or has an index. **Hash Join** - O(N+M), builds a hash table from the smaller relation in `work_mem`, then scans the larger. If memory runs out - spills to disk (Batches>1 in the plan). **Merge Join** - O(N log N + M log M), requires sorted inputs, efficient for large already-sorted datasets.

The plan shows `Hash Join` with `Batches: 16`. What does this mean and how to fix it?

Cost Model: How the Planner Calculates Cost

The PostgreSQL planner does not guess - it calculates. Each plan node gets a `cost` value based on seq_page_cost, random_page_cost, and cpu_tuple_cost weights. On SSD the difference between sequential and random I/O is minimal, but PostgreSQL defaults are tuned for spinning disks. This is the first parameter to change when moving to SSD.

Key cost model GUC parameters: `seq_page_cost=1.0` (baseline), `random_page_cost=4.0` (for SSD set to 1.1-1.5), `cpu_tuple_cost=0.01`, `effective_cache_size` - OS page cache estimate (affects Index Scan cost). Table statistics live in `pg_statistic`, updated by `ANALYZE`, controlled by `default_statistics_target` (default 100, for skewed data up to 500).

After migrating to NVMe SSD the planner still prefers Seq Scan over Index Scan. First step?

Practical Techniques: Buffers, Loops, Parallelism

Reading `EXPLAIN ANALYZE` means asking the right questions: where are the most expensive nodes, where do actual vs estimated rows diverge, are there Seq Scans on large tables, how many loops does Nested Loop perform. Each answer points to a concrete action.

**Plan analysis checklist:** (1) Find the most expensive node by actual time. (2) Check estimated rows vs actual rows - gap >10x requires ANALYZE. (3) Find Seq Scan on tables >100k rows - may need an index. (4) Find Nested Loop with large loops count - may benefit from Hash Join (check work_mem). (5) Check Buffers read vs hit - many reads mean data is not in cache. (6) Check parallel plan Workers Planned - is max_parallel_workers_per_gather configured?

If an index exists, the planner will always use it

The planner uses an index only if it is cheaper according to the cost model. With high random_page_cost or low selectivity, Seq Scan may win.

Index Scan cost = B-tree traversal + N random heap fetches. When N exceeds roughly 5% of table rows, seq_page_cost * total_pages becomes lower than the accumulated random I/O cost.

The plan shows `Nested Loop (loops=50000, actual time=0.03..180000 ms)`. How to speed it up?

Key Ideas

  • **EXPLAIN ANALYZE** shows actual rows and time - a discrepancy >10x with estimates signals stale statistics (run `ANALYZE table`).
  • **Scan type** depends on selectivity: Index Scan for <2% of rows, Bitmap Scan for 2-10%, Seq Scan for >10% - and on `random_page_cost` (lower to 1.1-1.5 on SSD).
  • **Hash Join Batches>1** means the hash spilled to disk; increase `work_mem` to fix. **Nested Loop with high loops** is a candidate for Hash Join replacement.

Related Topics

EXPLAIN ANALYZE builds on mechanisms covered in earlier lessons:

  • Indexes: B-tree and Structures — Index Scan in the plan is B-tree navigation from the previous lesson
  • Advanced Indexes — Partial and composite indexes expand the paths available to the planner

Вопросы для размышления

  • If `actual rows` is 100x smaller than `estimated rows`, what is the next step?
  • When is it useful to disable seq_scan via `SET enable_seqscan=off` for diagnostics?
  • What will change in query plans after migrating from HDD to NVMe SSD if the PostgreSQL configuration is left at defaults?

Связанные уроки

  • alg-21-dp
EXPLAIN ANALYZE: Reading the Query Plan