PostgreSQL

CTEs and WITH RECURSIVE

A categories table with a parent_id column. The task: find all subcategories of "Electronics" at any depth. Without a recursive CTE, this requires N queries in application code, a stored procedure with a loop, or denormalization. One recursive CTE solves it in standard SQL. For graphs with cycles - the same tool with a small addition.

  • **Org charts (HR systems):** who reports to manager X at all levels of hierarchy - WITH RECURSIVE in a single query
  • **Routing (logistics):** find all possible delivery paths from A to B - graph traversal via CTE
  • **Finance (accounts):** transitive closure of account dependencies for detecting circular references

CTEs: Syntax and When to Apply

A **CTE (Common Table Expression)** is a named subquery defined at the top of a query with `WITH`. The CTE result can be referenced multiple times in the main query, as if it were a temporary table. The primary value is readability: a complex query splits into named, sequential steps.

**CTE vs subquery in FROM:** syntactically interchangeable for simple cases. CTE wins when: (1) the same subquery is needed multiple times, (2) the query should be read top-to-bottom as an algorithm, (3) recursion is required.

The main advantage of a CTE over an equivalent subquery in FROM:

Recursive CTEs

With the `RECURSIVE` keyword, a CTE can reference itself. This is the only way in standard SQL to perform iterative computations without procedural code. The structure is always the same: a base case (anchor) plus a recursive step, joined with `UNION ALL`.

**Preventing infinite recursion:** always add a terminating condition in WHERE or use `LIMIT`. For graphs with cycles, track the path via an array of visited nodes. PostgreSQL has no built-in protection against infinite loops in recursive CTEs.

0

1

Sign In

In a recursive CTE, what is the role of the anchor (the part before UNION ALL)?

Hierarchical Data with RECURSIVE

An organizational chart, category trees, comment threads - all of these are stored as a table with a `parent_id`. Traversing such a tree without RECURSIVE CTE required N queries or denormalization. RECURSIVE handles it in a single query.

**Alternatives for large hierarchies:** the Ltree extension stores the path as `'1.5.23.104'` and supports indexed subtree queries. For read-heavy hierarchies, Nested Sets (left/right numbers) provide O(1) subtree queries without recursion.

In a recursive CTE for hierarchy traversal, a `path || current_id` array is tracked. What is its purpose?

Graph Traversal with CTEs

A graph (unlike a tree) can contain cycles: A -> B -> C -> A. Without protection, a recursive CTE will loop infinitely. The standard technique is tracking visited nodes in an array and filtering out already-seen ones.

**UNION vs UNION ALL in RECURSIVE CTEs:** `UNION ALL` is faster and adds all rows. `UNION` removes duplicates - it can serve as cycle protection instead of a visited array, but is slower on large graphs.

A recursive CTE traverses a graph without cycle protection. What happens when a cycle A->B->C->A exists?

MATERIALIZED vs NOT MATERIALIZED

Before PostgreSQL 12, all CTEs were **optimization fences**: the planner could not "look inside" a CTE to push external WHERE conditions in. The CTE ran fully and its result was cached. From PostgreSQL 12, the default changed - a CTE is not materialized if it is referenced exactly once.

Practical rule: if a CTE contains a `VOLATILE` function (`random()`, `now()`, `clock_timestamp()`) and is referenced more than once - `MATERIALIZED` is mandatory. Otherwise each reference calls the function again, producing different results.

**EXPLAIN check:** `EXPLAIN (ANALYZE)` shows `CTE Scan` for a materialized CTE or an inlined subquery node. `CTE Scan` always means materialization. If a `CTE Scan` appears for a CTE used only once, add `NOT MATERIALIZED` or rewrite it as a subquery.

A CTE is always an optimization fence: the planner cannot optimize it

Before PG12 - true. From PG12+, CTEs are inlined (NOT MATERIALIZED) by default when referenced once. The MATERIALIZED keyword explicitly restores the old behavior.

Much advice about "CTEs as optimization fences" was written before PG12. In modern PostgreSQL, write MATERIALIZED explicitly when caching or planner isolation is needed.

A CTE contains `random()` and is referenced in two places. Without `MATERIALIZED` in PostgreSQL 12+, what happens?

CTEs and WITH RECURSIVE

  • CTE: named subquery in WITH - improves readability, allows reusing results multiple times
  • RECURSIVE CTE: anchor UNION ALL recursive step - the only iteration mechanism in standard SQL
  • Hierarchies: parent_id table + RECURSIVE = tree traversal at any depth in one query
  • Graphs with cycles: visited array + WHERE id != ALL(visited) - mandatory cycle protection
  • MATERIALIZED: required for VOLATILE functions and when CTE is referenced 2+ times; PG12+ inlines by default

Related Topics

CTEs build on subqueries and are often replaced by window functions.

  • Subqueries: EXISTS, IN, ANY, ALL — CTEs are the evolution of subqueries: named, reusable, recursive
  • Window Functions — Many tasks solved by recursive CTEs (rank, running total) are simpler and faster with window functions
  • EXPLAIN and the Query Planner — CTE Scan in EXPLAIN reveals materialization - key to understanding CTE performance

Вопросы для размышления

  • A recursive CTE iterates in BFS order. Can DFS be implemented through a standard RECURSIVE CTE in PostgreSQL?
  • MATERIALIZED in PG12+ must be specified explicitly. In which production scenarios does forgetting MATERIALIZED lead to incorrect data rather than just a slowdown?
  • Ltree extension vs RECURSIVE CTE for hierarchies: at what tree size and query frequency is switching to Ltree worthwhile?

Связанные уроки

  • pg-09-subqueries — CTEs are named subqueries; subquery mechanics must be understood first
  • pg-11-window — CTEs and window functions often appear together in analytics queries
  • pg-08-aggregation — Recursive CTEs for hierarchical aggregation combine both concepts
  • pg-07-joins — Inline views (subqueries in FROM) vs CTEs: same power, different readability tradeoffs
  • db-06-sql-advanced
CTEs and WITH RECURSIVE