PostgreSQL
JSONB: Document Model in PostgreSQL
What do you do when every user has a different set of attributes and the schema changes every week? MongoDB? No. PostgreSQL with JSONB gives you document flexibility without giving up transactions, JOINs, and ACID. Shopify, GitLab, Notion all run JSONB in production.
- **Shopify:** product metadata in JSONB. 80M+ products with arbitrary attributes (customs codes, materials, country-specific size charts) without an EAV table
- **GitLab:** JSONB stores CI/CD pipeline metadata and variables. GIN indexes filter 50M+ records in tens of milliseconds
- **Notion:** page blocks live in JSONB. Each block (text, table, database) has a different structure. PostgreSQL replaced a document database without losing ACID guarantees
JSONB Operators
JSONB stores data in binary form: on write the JSON is parsed and saved as a tree, which makes every operation faster than over plain JSON text. Keys are sorted and duplicates dropped, so field lookups don't reparse a string each time.
Shopify stores product metadata in JSONB: tens of millions of SKUs with arbitrary attributes (color, size, material, customs code). A fixed schema would require hundreds of columns or an EAV table. Both options are worse in performance and maintenance.
The `->` operator returns JSONB, while `->>` returns text. When comparing against a text literal, always use `->>`. Otherwise you need a cast: `(data -> 'price')::numeric`.
Which operator returns the text value of the `name` field from a JSONB column `data`?
GIN Indexes for JSONB
GIN (Generalized Inverted Index) indexes every key and value inside a JSONB document separately. Think of an inverted index of words in a book: GIN does the same for JSON trees. Without it, lookups on nested fields fall back to a Seq Scan over the whole table.
`jsonb_path_ops` produces an index roughly 3x smaller than standard GIN and is faster for containment queries (`@>`). Use standard GIN only when you need the `?` (key existence) operator or `?|`, `?&`.
GitLab uses GIN indexes on JSONB columns of the issues table for metadata and filter storage. With 10M+ issues, that's the difference between 50 ms and 15 seconds for attribute-based filtering.
Which GIN operator class fits when only containment (`@>`) is needed and index size matters?
JSONPath: XPath for JSON
JSONPath (SQL/JSON, SQL:2016 standard) supports complex queries over JSON trees with filters, recursive descent, and arithmetic. PostgreSQL supports JSONPath starting with version 12.
JSONPath works directly with a GIN index (under `jsonb_path_ops`). Functions: `jsonb_path_exists` returns boolean, `jsonb_path_query` returns a set of rows, `jsonb_path_query_array` returns an array, `jsonb_path_query_first` returns the first element.
Which JSONPath expression finds every element of the `tags` array with the value `"premium"`?
Containment and Existence Operators
Containment (`@>`) checks whether the left JSONB contains the right one as a subset. Existence (`?`) checks whether a key exists at the top level. These two operators are the most important for filtering on JSONB, and both can use a GIN index.
| Operator | Meaning | Index |
|---|---|---|
| @> | left contains right | GIN (both classes) |
| <@ | right contains left | GIN (both classes) |
| ? | key exists | GIN (default) |
| ?| | any key exists | GIN (default) |
| ?& | all keys exist | GIN (default) |
Which query finds every row where the JSONB field `data` has an `email` key at the top level?
JSONB vs Fixed Columns
JSONB and relational columns solve different problems. Notion stores block content in JSONB (block structure varies), but page metadata (title, created_at, owner_id) lives in regular columns. Mixing the two without understanding the trade-offs leads to slow queries and painful migrations.
- Use JSONB when... — Schema is unstable or unique per row. Attributes are sparse (most rows only have some of them). Data arrives from external APIs without a fixed structure. You need to store hierarchical data without normalizing.
- Use columns when... — Every row has the field. The field is frequently used in JOIN, GROUP BY, ORDER BY. You need strict types and constraints (NOT NULL, CHECK, FK). The field participates in a B-tree index for range queries.
JSONB replaces relational tables. You can fit everything into one table with one JSONB column.
JSONB complements the relational model for cases without a fixed schema. Fields with fixed semantics and types are always better off as columns.
JSONB doesn't support FK, doesn't enforce types, and doesn't support B-tree for range queries. A JOIN on `(data ->> 'user_id')::int` runs 5-10x slower than a JOIN on a regular indexed INT column.
Instagram stores the filters applied to a photo (users pick different sets). What fits best?
Key Ideas
- **JSONB vs JSON:** JSONB stores a binary tree. Queries are faster, indexes work. JSON stores text verbatim and is only useful for audit logs and output
- **GIN indexes:** `jsonb_path_ops` for containment/jsonpath (smaller size), default GIN for existence operators (`?`, `?|`, `?&`)
- **Choice rule:** fixed fields go in columns (type, FK, B-tree). Unstable/sparse attributes go in JSONB (flexibility + GIN)
Related Topics
JSONB ties closely to other PostgreSQL mechanisms:
- PostgreSQL Indexes — GIN is one of the PostgreSQL index types. Understanding B-tree vs GIN vs GiST is critical for choosing a JSONB indexing strategy
- Full-Text Search — FTS also relies on GIN indexes for tsvector. Architecturally similar to JSONB indexing
- PostgreSQL Extensions — pg_trgm extends string search inside JSONB. pgvector keeps embeddings in columns alongside JSONB metadata
Вопросы для размышления
- The users table has a `settings JSONB` field with notification settings. How do you add an index that quickly finds users with `email_notifications` enabled?
- Why does `WHERE data ->> 'user_id' = '42'` run slower than `WHERE user_id = 42`, even with an index on `(data ->> 'user_id')`?
- Shopify adds a new product attribute `carbon_footprint`. Does it need a schema migration, or is writing into the JSONB column enough?