Databases
GIN, GiST, BRIN, and Specialized Indexes
Цели урока
- Create GIN indexes for full-text search and JSONB
- Apply GiST for geo queries and range types
- Use BRIN for append-only tables with time series
- Optimize indexes with partial and expression techniques
B-Tree covers 80% of use cases. Full-text search, geo queries, and billion-row time series require specialized structures. GIN, GiST, and BRIN are all built into PostgreSQL - no additional services needed.
- **Airbnb:** GiST for radius-based accommodation search
- **Instead of Elasticsearch:** GIN + tsvector for full-text search directly in PostgreSQL
- **IoT / Metrics:** BRIN on timestamp for hundreds-of-gigabyte tables
- **SaaS:** partial index on active records reduces index size by 10-100x
GIN for Full-Text Search
Finding all articles that contain the words "machine learning" is not a task B-Tree can handle - it compares whole strings, not words within them. **GIN (Generalized Inverted Index)** works like a book's back-of-index: word -> list of documents containing it. A search for "machine" returns one list, "learning" returns another - GIN produces the intersection.
GIN is also used for **JSONB queries** and arrays. The operators `@>` (containment), `?` (key exists), and `&&` (array overlap) are all supported by GIN indexes, making PostgreSQL a capable document store.
**GIN vs B-Tree for text:** B-Tree can find strings matching LIKE 'prefix%', but not '%suffix' or individual words inside a string. GIN indexes each token separately - searching by any word is equally fast.
Which operator checks that a JSONB document contains a given sub-document and is supported by a GIN index?
GiST for Geospatial Data
Finding all cafes within 2 km of point (55.75, 37.62) is not solvable with B-Tree alone - it stores numbers in linear order and can filter by latitude or longitude separately, but not by distance. **GiST (Generalized Search Tree)** is an extensible framework for spatial structures. PostGIS builds an R-Tree on top of GiST for geo queries.
GiST also supports **range types**. The `&&` operator on ranges checks period overlap and uses a GiST index - useful for booking constraints that prevent conflicting reservations.
A query finds the 10 nearest points to given coordinates. Which operator enables GiST-based KNN lookup?
BRIN for Time Series
A `metrics` table with 10 billion telemetry rows. A B-Tree index on `timestamp` would consume roughly 200 GB. **BRIN (Block Range Index)** stores only min/max values for each range of physical pages - the entire index occupies ~50 MB. It works because data is inserted chronologically: newer rows land on newer pages.
**When BRIN fails:** if data is inserted out of chronological order, the min/max ranges overlap and the planner cannot exclude any pages. BRIN is effective only when there is high **correlation** between insert order and column value.
BRIN is ineffective for a user_id column with random values. Why?
Partial Indexes
An `orders` table has 100 million rows. 99% have status 'completed' and are never queried. The 1% with status 'pending' need fast lookups. A full index covers 100 million rows; what is actually needed is 1 million. A **Partial Index** indexes only rows satisfying a WHERE condition - 100x smaller, faster to update, fits in cache.
Partial indexes also solve the **NULL in unique index** problem. By SQL standard, NULL != NULL, so a plain UNIQUE index permits multiple NULL values. A partial index `WHERE col IS NOT NULL` enforces uniqueness only among non-NULL values - typically the intended behavior.
A partial index is defined with WHERE status = 'active'. A query with WHERE email = 'x@y.com' (no status filter) - does it use the index?
Expression Indexes
A query `WHERE LOWER(email) = 'user@example.com'` does not use a plain index on `email` - the LOWER function transforms the value, and the B-Tree has no knowledge of the result. An **Expression Index** indexes the result of a function or expression rather than the raw column.
**Exact match required:** the expression in the query must match the expression in CREATE INDEX exactly. `LOWER(email)` and `LOWER(TRIM(email))` produce different indexes. PostgreSQL compares expression trees, so even argument order matters.
**Expression indexes** can index any IMMUTABLE expression: arithmetic, string functions, JSON operators, or custom functions. The expression in WHERE must match the expression in CREATE INDEX exactly - PostgreSQL compares the parse trees.
An expression index is defined on LOWER(email). Which query uses it?
Specialized Indexes
- GIN: inverted index for tokens - full-text (@@), jsonb (@>), arrays (&&)
- GiST: extensible framework - geo data (PostGIS R-Tree), range types, KNN (<->)
- BRIN: min/max per page range - for chronological data, thousands of times smaller than B-Tree
- Partial index: WHERE in CREATE INDEX - indexes only the needed rows
- Expression index: indexes LOWER(col), jsonb->>'key' - query expression must match exactly
Related Topics
Specialized indexes complement B-Tree where data is not linear.
- B-Tree Indexes — The baseline type, starting point
- Query Optimization — How the planner selects index type
- Full-Text Search — Deeper dive into tsvector, tsquery, ranking
Вопросы для размышления
- When should PostgreSQL + GIN be chosen over a dedicated Elasticsearch cluster?
- How can a table be identified as a good BRIN candidate without running queries?
- Can partial, expression, and GIN be combined in a single index?