Databases
ACID: The Four Pillars of Reliability
2012. Knight Capital Group. A trading algorithm fired 4 million market orders in 45 minutes - partial, no rollback. The firm lost USD 440 million in half an hour of trading - more than it had earned in the entire previous decade. ACID is not academic theory. It is the line between a programmer's mistake and a company-killing event.
- **Knight Capital, 2012** - missing atomic rollback in a trading system cost USD 440 million in 45 minutes
- **Booking.com and airlines** - Isolation prevents double-booking: two users can't buy the last seat at the same time
- **PostgreSQL (WAL + fsync)** - Durability guarantees that a confirmed booking survives a power outage
Предварительные знания
Jim Gray and the Birth of Transactions
In 1976, Jim Gray of IBM Research published foundational work on transactional systems. He was the first to formalize the concepts that would become ACID - the acronym itself arrived later, in 1983, coined by Theo Harder and Andreas Reuter. Gray won the Turing Award in 1998 for his work on transaction processing. In 2007 he vanished without trace in the Pacific Ocean during a solo sailing trip and is still listed as missing.
Atomicity: All or Nothing
2012. Knight Capital Group. A trading algorithm fired 4 million market orders in 45 minutes - partial, no rollback. The firm lost USD 440 million in half an hour. Not a virus, not a hack - just a transaction without atomicity. **Atomicity** guarantees one outcome: both operations succeed, or neither happens.
The word **"atom"** comes from the Greek for "indivisible". A transaction is atomic: externally it behaves as one operation. There is no halfway state where money has been debited but not yet credited.
**What if the server crashes mid-COMMIT?** The DBMS leans on a **Write-Ahead Log (WAL)**: every operation hits the log before it touches the data files. On restart, the DBMS replays the WAL and either finishes an interrupted COMMIT or rolls it back.
**Long-running transactions are poison.** An open transaction can hold row locks and freeze every other transaction touching those rows. BEGIN → ... heavy logic for 10 seconds ... → COMMIT = 10 seconds of blocking. Keep transactions as short as possible.
Transaction: BEGIN → UPDATE (debit 500 from Alice) → server crashes → UPDATE (credit 500 to Bob) not executed → COMMIT never called. What happens to Alice's balance after the server restarts?
Consistency: Data Is Always Valid
**Consistency** in ACID means the database always moves from one valid state to another. Data never lands in a partial or invalid state. Any transaction that breaks an integrity rule rolls back as a whole.
Consistency lives in **constraints** - rules the database checks automatically on every write.
A **trigger** is a function that fires automatically on INSERT, UPDATE, or DELETE. Triggers express business rules too complex for a CHECK constraint.
| Constraint | What it checks | When to use |
|---|---|---|
| NOT NULL | Value is required | Name, email, creation date |
| CHECK | Arbitrary condition | Balance >= 0, age > 0, status IN (...) |
| UNIQUE | No duplicates | Email, username, phone number |
| FOREIGN KEY | Reference to an existing record | user_id → users.id |
| TRIGGER | Any logic (a function) | Daily limits, auditing, cascading updates |
**Consistency in ACID vs Consistency in CAP** - same word, different concepts. ACID Consistency means every transaction leaves data in a valid state (constraints). CAP Consistency means every node in a cluster sees the same data at the same time (replication). Do not mix them up.
The accounts table has CHECK (balance >= 0). Transaction: BEGIN → UPDATE (balance = balance - 1000, current balance = 500) → COMMIT. What happens?
Isolation: Transaction Isolation
When thousands of users hammer a database at once, their transactions must not step on each other. **Isolation** defines how much one transaction can "see" of another transaction's uncommitted changes.
Skip isolation and anomalies appear - situations where concurrent transactions return wrong results.
| Isolation Level | Dirty Read | Non-repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Maximum |
| READ COMMITTED | Protected | Possible | Possible | High |
| REPEATABLE READ | Protected | Protected | Possible | Medium |
| SERIALIZABLE | Protected | Protected | Protected | Low |
**PostgreSQL default: READ COMMITTED.** Each SELECT inside a transaction sees only committed data, but two SELECTs in the same transaction can return different results (non-repeatable read). Financial operations usually call for SERIALIZABLE.
**Practical rule:** 95% of web apps run fine on READ COMMITTED. SERIALIZABLE earns its keep on financial transactions, reservations (where double-booking is a disaster), and inventory. Higher isolation = more locks = lower throughput.
**Deadlock**: transaction A waits for a row locked by B while B waits for a row locked by A. The DBMS detects the cycle and force-rolls-back one of the transactions. Application code must be ready to retry the loser.
Isolation level READ COMMITTED. Transaction A: BEGIN → SELECT balance (500) → ... waits ... → SELECT balance (?). Between the two SELECTs, another transaction changed the balance to 300 and COMMITted. What does the second SELECT return?
Durability: Data Survives Any Failure
COMMIT fires. The server confirms "transaction complete". A second later, the power dies. **Durability** guarantees that once COMMIT returns successfully, data is safe forever, even if the server burns down a millisecond later.
How does that work? Disk writes are not instantaneous. Data passes through stacked caches: app cache → DBMS buffer pool → filesystem page cache → disk controller cache → physical disk. A failure at any layer can lose data. **Write-Ahead Logging (WAL)** plugs the hole.
**fsync** is the system call that forces the OS to flush cached data to physical disk. Without fsync, data can linger in the page cache and vanish on power loss. PostgreSQL calls fsync on every WAL write at COMMIT.
**fsync = off** never belongs in production. The database does run noticeably faster without disk waits, but a sudden crash can corrupt data beyond recovery. Reserve it for test environments and bulk data imports.
**Replication** adds another durability layer. Even if a disk physically dies, data survives on a replica. PostgreSQL supports **synchronous replication**: COMMIT does not return until the replica has written the data. This protects against losing an entire server.
PostgreSQL: synchronous_commit = on. The client receives a COMMIT confirmation. 1 ms later the server loses power. What happens to this transaction's data?
ACID Trade-offs: CAP and BASE
ACID is the gold standard for reliability. Reliability has a price: locks, synchronous disk writes, waiting for replicas. In distributed systems handling millions of requests per second, **full ACID can become too expensive**.
The **CAP theorem** (Eric Brewer, 2000) is brutal: a distributed system cannot simultaneously guarantee all three properties. Pick two of three.
| Property | What it means | Example |
|---|---|---|
| Consistency (C) | All nodes see the same data | Balance = 500 on all servers |
| Availability (A) | Every request gets a response | Site works even if a node is down |
| Partition Tolerance (P) | System works during a network split | Server in US and EU lost connectivity |
In practice **P is mandatory** - networks always fail eventually - so the real choice sits between **CP** (consistency at the cost of dropping some requests) and **AP** (availability at the cost of stale data).
**Decision rule:** if wrong or lost data costs money (finance, healthcare, reservations), use ACID. If a few seconds of staleness are harmless (likes, views, logs, cache), use BASE. Most production systems run both, picking the right tool per data type.
**NewSQL** databases chase the holy grail: ACID + horizontal scaling. Google Spanner, CockroachDB, YugabyteDB deliver ACID transactions across hundreds of servers, paying for it with latency, because cross-node consensus takes time.
ACID is always required - data must be perfectly consistent at all times
For analytics, logs, counters, news feeds, and caches, BASE (eventual consistency) is often preferable. ACID guarantees have a cost: locks, synchronous writes, reduced throughput. Choosing between ACID and BASE is an engineering decision, not a question of correctness
Instagram doesn't use ACID transactions to count likes - at 100,000+ likes/sec, locks would kill performance. But for payments, reservations, and bank transfers, ACID is mandatory. A good architect uses ACID where precision is needed and BASE where speed matters.
A video view counter system is being designed (YouTube scale: millions of views per second). What's the optimal approach?
Key Takeaways
- **Atomicity** - a transaction is indivisible: all operations apply or all are rolled back. Knight Capital 2012 is a vivid example of the cost of missing atomicity
- **Consistency** - constraints (CHECK, FK, NOT NULL, UNIQUE, triggers) guarantee data is always in a valid state
- **Isolation** - four levels (READ UNCOMMITTED → SERIALIZABLE) define how much concurrent transactions see each other's changes. Higher isolation = fewer anomalies, but more locks
- **Durability** - WAL + fsync guarantee: after COMMIT, data is on disk even if the server shuts down
- **ACID vs BASE** - not a binary choice. Use ACID for critical data (money, reservations) and BASE for scalable non-critical data (likes, views, logs)
Related Topics
ACID is the foundation on which advanced database mechanisms are built:
- The Relational Model — Constraints from the relational model (PK, FK, CHECK) are the foundation of the Consistency property
- Why Databases Exist — Introductory lesson: files vs. databases, CRUD, client-server model - prerequisites for understanding ACID
- CAP Theorem — CAP explains why full ACID is unachievable in distributed systems
Вопросы для размышления
- In what situations would a project call for the SERIALIZABLE isolation level, and where would READ COMMITTED be sufficient?
- If a server is in a data center with a UPS (uninterruptible power supply) and a RAID array, is WAL still necessary? Why?
- Give a real-world example where eventual consistency (BASE) is a better choice than strict consistency (ACID).
Связанные уроки
- db-04-cap — ACID-Consistency and CAP-Consistency are different terms - one lesson clarifies both
- db-14-mvcc — MVCC implements Isolation without read locks - the mechanism behind the guarantee
- db-13-transactions — Advanced transaction patterns build on top of ACID guarantees
- db-16-distributed-tx — 2PC and Saga are attempts to preserve ACID atomicity across distributed systems
- db-41-newsql — CockroachDB and Spanner deliver ACID over a horizontally scaled cluster
- ds-02-cap-theorem — CAP theorem explains why full ACID is unachievable in distributed systems
- bt-18-saga