Databases

GIN, GiST, BRIN, and Specialized Indexes

Цели урока

  • Create GIN indexes for full-text search and JSONB
  • Apply GiST for geo queries and range types
  • Use BRIN for append-only tables with time series
  • Optimize indexes with partial and expression techniques

B-Tree covers 80% of use cases. Full-text search, geo queries, and billion-row time series require specialized structures. GIN, GiST, and BRIN are all built into PostgreSQL - no additional services needed.

  • **Airbnb:** GiST for radius-based accommodation search
  • **Instead of Elasticsearch:** GIN + tsvector for full-text search directly in PostgreSQL
  • **IoT / Metrics:** BRIN on timestamp for hundreds-of-gigabyte tables
  • **SaaS:** partial index on active records reduces index size by 10-100x

GIN for Full-Text Search

Finding all articles that contain the words "machine learning" is not a task B-Tree can handle - it compares whole strings, not words within them. **GIN (Generalized Inverted Index)** works like a book's back-of-index: word -> list of documents containing it. A search for "machine" returns one list, "learning" returns another - GIN produces the intersection.

GIN is also used for **JSONB queries** and arrays. The operators `@>` (containment), `?` (key exists), and `&&` (array overlap) are all supported by GIN indexes, making PostgreSQL a capable document store.

**GIN vs B-Tree for text:** B-Tree can find strings matching LIKE 'prefix%', but not '%suffix' or individual words inside a string. GIN indexes each token separately - searching by any word is equally fast.

Which operator checks that a JSONB document contains a given sub-document and is supported by a GIN index?

GiST for Geospatial Data

Finding all cafes within 2 km of point (55.75, 37.62) is not solvable with B-Tree alone - it stores numbers in linear order and can filter by latitude or longitude separately, but not by distance. **GiST (Generalized Search Tree)** is an extensible framework for spatial structures. PostGIS builds an R-Tree on top of GiST for geo queries.

GiST also supports **range types**. The `&&` operator on ranges checks period overlap and uses a GiST index - useful for booking constraints that prevent conflicting reservations.

A query finds the 10 nearest points to given coordinates. Which operator enables GiST-based KNN lookup?

BRIN for Time Series

A `metrics` table with 10 billion telemetry rows. A B-Tree index on `timestamp` would consume roughly 200 GB. **BRIN (Block Range Index)** stores only min/max values for each range of physical pages - the entire index occupies ~50 MB. It works because data is inserted chronologically: newer rows land on newer pages.

**When BRIN fails:** if data is inserted out of chronological order, the min/max ranges overlap and the planner cannot exclude any pages. BRIN is effective only when there is high **correlation** between insert order and column value.

BRIN is ineffective for a user_id column with random values. Why?

Partial Indexes

An `orders` table has 100 million rows. 99% have status 'completed' and are never queried. The 1% with status 'pending' need fast lookups. A full index covers 100 million rows; what is actually needed is 1 million. A **Partial Index** indexes only rows satisfying a WHERE condition - 100x smaller, faster to update, fits in cache.

Partial indexes also solve the **NULL in unique index** problem. By SQL standard, NULL != NULL, so a plain UNIQUE index permits multiple NULL values. A partial index `WHERE col IS NOT NULL` enforces uniqueness only among non-NULL values - typically the intended behavior.

A partial index is defined with WHERE status = 'active'. A query with WHERE email = 'x@y.com' (no status filter) - does it use the index?

Expression Indexes

A query `WHERE LOWER(email) = 'user@example.com'` does not use a plain index on `email` - the LOWER function transforms the value, and the B-Tree has no knowledge of the result. An **Expression Index** indexes the result of a function or expression rather than the raw column.

**Exact match required:** the expression in the query must match the expression in CREATE INDEX exactly. `LOWER(email)` and `LOWER(TRIM(email))` produce different indexes. PostgreSQL compares expression trees, so even argument order matters.

**Expression indexes** can index any IMMUTABLE expression: arithmetic, string functions, JSON operators, or custom functions. The expression in WHERE must match the expression in CREATE INDEX exactly - PostgreSQL compares the parse trees.

An expression index is defined on LOWER(email). Which query uses it?

Specialized Indexes

  • GIN: inverted index for tokens - full-text (@@), jsonb (@>), arrays (&&)
  • GiST: extensible framework - geo data (PostGIS R-Tree), range types, KNN (<->)
  • BRIN: min/max per page range - for chronological data, thousands of times smaller than B-Tree
  • Partial index: WHERE in CREATE INDEX - indexes only the needed rows
  • Expression index: indexes LOWER(col), jsonb->>'key' - query expression must match exactly

Related Topics

Specialized indexes complement B-Tree where data is not linear.

  • B-Tree Indexes — The baseline type, starting point
  • Query Optimization — How the planner selects index type
  • Full-Text Search — Deeper dive into tsvector, tsquery, ranking

Вопросы для размышления

  • When should PostgreSQL + GIN be chosen over a dedicated Elasticsearch cluster?
  • How can a table be identified as a good BRIN candidate without running queries?
  • Can partial, expression, and GIN be combined in a single index?

Связанные уроки

  • ds-09-trees-intro
GIN, GiST, BRIN, and Specialized Indexes

0

1

Sign In