Databases

EXPLAIN ANALYZE: Reading the Query Plan

A query that took one second on 10,000 test rows crawls for 45 seconds in production with 50 million rows. The problem is not the code - the planner chose the wrong plan due to stale statistics. One `EXPLAIN ANALYZE` reveals the cause in 30 seconds.

**Shopify** runs automatic plan analysis for every slow query - EXPLAIN ANALYZE fires automatically for queries over 100 ms and the result is stored in their monitoring system.
**GitHub** found that their largest JOIN degraded after a data migration: statistics were stale and the planner chose Nested Loop instead of Hash Join. One ANALYZE restored performance.
**pg_stat_statements** extension is standard in any serious production setup - it surfaces the top-N queries by total_time and helps prioritize optimization work.

Anatomy of EXPLAIN ANALYZE

PostgreSQL does not hide how it thinks: `EXPLAIN ANALYZE` returns the full execution plan tree with estimated and actual costs, row counts, and per-node timing. It is an X-ray of the query, readable by anyone who knows the format.

The difference between `EXPLAIN` and `EXPLAIN ANALYZE`: the former builds the plan without executing (cost = planner estimate), the latter actually runs the query and reports actual rows/time. For SELECT this is safe; for INSERT/UPDATE/DELETE use `BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;`.

The planner shows `cost=145..892 rows=1200` but `actual rows=45`. What does this indicate?

Seq Scan, Index Scan, Bitmap Scan

Three scan types represent three different data access strategies. PostgreSQL chooses among them based on selectivity - what fraction of the table the filter touches. A counterintuitive fact: Seq Scan is sometimes faster than Index Scan - especially when 30-40% of rows are needed, because sequential disk reads outperform random reads.

Databases

EXPLAIN ANALYZE: Reading the Query Plan

**Shopify** runs automatic plan analysis for every slow query - EXPLAIN ANALYZE fires automatically for queries over 100 ms and the result is stored in their monitoring system.
**GitHub** found that their largest JOIN degraded after a data migration: statistics were stale and the planner chose Nested Loop instead of Hash Join. One ANALYZE restored performance.
**pg_stat_statements** extension is standard in any serious production setup - it surfaces the top-N queries by total_time and helps prioritize optimization work.

Anatomy of EXPLAIN ANALYZE

The planner shows `cost=145..892 rows=1200` but `actual rows=45`. What does this indicate?

EXPLAIN ANALYZE: Reading the Query Plan

Anatomy of EXPLAIN ANALYZE

Seq Scan, Index Scan, Bitmap Scan

EXPLAIN ANALYZE: Reading the Query Plan

Anatomy of EXPLAIN ANALYZE

Seq Scan, Index Scan, Bitmap Scan

Nested Loop, Hash Join, Merge Join

Cost Model: How the Planner Calculates Cost

Practical Techniques: Buffers, Loops, Parallelism

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки