PostgreSQL

SELECT: filtering, sorting, LIMIT

`SELECT * FROM orders WHERE status NOT IN ('cancelled', NULL)` - this query returns 0 rows regardless of what the table contains. NULL inside an IN list expands to `status != NULL`, which evaluates to UNKNOWN for every row. SQL's three-valued boolean algebra silently destroys results, without raising an error.

**Pagination API** - switching from OFFSET to keyset pagination cuts page-500 load time from 2 seconds to 5 ms
**Analytics dashboard** - `DISTINCT ON` replaces a subquery with ROW_NUMBER for the 'latest event per user' pattern, simplifying the query significantly
**Financial reports** - NULLIF prevents `division by zero` when aggregating over periods with zero activity

WHERE: operators and types

WHERE filters rows before aggregation and grouping. The choice of operator affects not just correctness but also whether the planner can use an index.

**A leading wildcard in LIKE bypasses the B-tree index.** `LIKE '%gmail.com'` causes a full Seq Scan. For suffix and substring search, use `pg_trgm` with a GIN index: `CREATE INDEX ON users USING GIN (email gin_trgm_ops);`

Implicit type casting can break index usage. When a column is `int4` but the query passes a string literal:

**EXISTS vs IN** for large subqueries: `IN` materializes the entire subquery in memory. `EXISTS` short-circuits on the first match and typically produces a more efficient plan.

**ALL** is the counterpart to ANY: `amount > ALL(ARRAY[10, 20, 50])` means greater than every element. Rarely needed, but useful for defensive range checks.

Column `user_id` is `int4` with a B-tree index. Which query is guaranteed to use the index?

ORDER BY, LIMIT, OFFSET and keyset pagination

ORDER BY determines the row order in the result. Without it, order is undefined - PostgreSQL returns rows in whatever order is convenient for the planner (often a heap scan order, but not always).

**OFFSET scales poorly.** To return page 100 with OFFSET 9900 LIMIT 100, PostgreSQL reads and discards 9,900 rows - the work is done but the result is thrown away.

Method	Speed at page 1	Speed at page 500	Random access	Stable with new rows
OFFSET/LIMIT	fast	slow (O(n))	yes	no (duplicates)
Keyset (cursor)	fast	fast (O(1))	no	yes (stable)

Keyset pagination requires a stable unique sort key (typically `id` or `(created_at, id)`). For APIs without random page jumps it is almost always the better choice.

The query `SELECT * FROM events ORDER BY id LIMIT 20 OFFSET 50000` is slow. What is happening internally?

DISTINCT ON: PostgreSQL's way to pick the first row per group

`DISTINCT ON` is a PostgreSQL extension to standard SQL. It returns exactly one row for each unique combination of the specified columns, selecting the "first" row according to ORDER BY.

**Rule:** columns in `DISTINCT ON (...)` must match the leading columns in `ORDER BY`. PostgreSQL enforces this - violating it raises: `SELECT DISTINCT ON expressions must match initial ORDER BY expressions`.

**Performance:** PostgreSQL can often use an index on `(user_id, created_at DESC)` for DISTINCT ON without a separate sort step - the plan shows `Index Scan` instead of `Sort`.

The goal is to retrieve the most recent order per user (all columns). Which approach is correct in PostgreSQL?

CASE, COALESCE, NULLIF and guarding against division by zero

`CASE` is the only conditional construct in standard SQL. It works in SELECT (computed columns), WHERE (complex conditions), and ORDER BY (custom sort order).

`COALESCE(a, b, c)` returns the first non-NULL argument. It is the standard tool for default values and column fallback chains.

**COALESCE is lazy:** it evaluates arguments left to right and stops at the first non-null. This matters when arguments contain subqueries or expensive expressions.

A query contains `total_revenue / total_orders`. When `total_orders = 0`, what happens and how should it be fixed?

NULL: three-valued logic and traps

NULL in SQL is not a value - it is the absence of a value. Any comparison involving NULL returns UNKNOWN (the third value in SQL's boolean logic, alongside TRUE and FALSE), and WHERE only passes rows where the result is TRUE.

**The NOT IN trap with NULLs in the list.** If a subquery or list contains even one NULL, `NOT IN` returns an empty result for every row - mathematically correct, but almost never the intended behavior.

**Rule of thumb:** always handle NULL explicitly via `IS NULL` / `IS NOT NULL` / `COALESCE`. Never rely on `!=` or `NOT IN` when the data may contain NULLs.

WHERE col != 'x' returns all rows except those where col = 'x'

WHERE col != 'x' returns rows where col = 'x' is false AND col is not NULL. Rows where col IS NULL are silently excluded.

NULL participates in comparisons as UNKNOWN. UNKNOWN != TRUE, so WHERE does not pass those rows. This behavior follows the SQL standard but consistently surprises even experienced developers.

The `payments` table has 1,000 rows. The query `SELECT * FROM payments WHERE refunded_at != '2024-01-01'` returns 800 rows. Why not 1,000?