Distributed Systems
Consistency Models
Цели урока
- Distinguish linearizable, sequential, causal and eventual consistency
- Apply the W+R > N formula to choose levels in Cassandra/DynamoDB
- Explain the difference between C in CAP and C in ACID
- Select a consistency model based on business requirements
Предварительные знания
- CAP theorem: understanding the Consistency vs Availability trade-off
- Replication: leader/follower and multi-leader approaches
- Basic understanding of vector clocks (for causal consistency)
DynamoDB shows a cart with 3 items while another replica already holds 4 - and that is the correct decision for a service targeting 99.999% uptime. Google Spanner stores financial data with linearizable consistency backed by atomic clocks in every datacenter. One letter C in CAP hides five different levels of strictness.
- **Amazon DynamoDB** - eventual by default, strongly consistent reads at 2x cost: a deliberate availability vs cost trade-off
- **Google Spanner** - the only production-grade linearizable globally distributed database, atomic clocks in every region
- **Cassandra** - tunable consistency per request: different tables with different policies in one cluster
- **Facebook TAO** - causal consistency for the social graph: a reply is only visible if the original post is visible
- **Redis Cluster** - eventual by default, WAIT command for synchronous replication when needed
Werner Vogels and eventually consistent
In 2008 Amazon CTO Werner Vogels published 'Eventually Consistent' in ACM Queue. He systematized consistency models from the experience of building DynamoDB and S3. The core message: eventual consistency is not a weakness but a deliberate engineering choice for scalability. The article changed industry discourse - engineers stopped apologizing for eventual consistency and began choosing it explicitly.
The Consistency Spectrum
**Amazon DynamoDB, 2012. Prime Day. Hundreds of millions of requests per second. A shopper's cart shows 3 items - while another replica holds 4.** The shopper notices nothing: within a second both replicas converge. This is not a bug - it is a deliberate choice of consistency model in exchange for 99.999% availability.
**Term confusion:** C in CAP is NOT the same as C in ACID. CAP-consistency = linearizability (all nodes see the same data). ACID-consistency = business invariants (a balance cannot go negative). Two different concepts, one letter.
A **consistency model** is a contract between the system and its clients: what is guaranteed when reading data after a write. From the strictest to the weakest - this is not a binary choice but a continuous spectrum.
| Model | Guarantee | Cost |
|---|---|---|
| **Linearizable** | Operations appear instantaneous, real-time order | Consensus, high latency |
| **Sequential** | Single global order for all nodes, not necessarily real-time | Coordination, medium latency |
| **Causal** | Causally related operations are ordered | Vector clocks, no consensus needed |
| **Read-your-writes** | Own write visible immediately within a session | Sticky sessions or version tokens |
| **Eventual** | All replicas converge eventually | Stale reads possible, conflict resolution required |
**Practical rule:** the stronger the model, the higher the latency and the lower the availability under partition. Business requirements drive the choice: financial data needs linearizable, a shopping cart can tolerate eventual.
Consistency is a binary choice: either present or absent
Consistency is a spectrum of at least 5 levels, each with a different latency/availability trade-off
In practice a single system can use different levels for different operations. Cassandra allows choosing a consistency level per request. DynamoDB offers strongly consistent reads as an option at double the latency cost.
How does C in CAP differ from C in ACID?
Linearizability: operations as if on one machine
**Google Spanner, 2012. The first globally distributed database with linearizable consistency.** A transaction committed in an Iowa datacenter is visible to any node in Tokyo or London within a few milliseconds. The price: atomic clocks and GPS receivers in every datacenter.
**Linearizability** is the strictest point on the consistency spectrum. Each operation appears to take effect instantaneously at some point between its invocation and completion. After that point all readers see the result.
How Spanner achieves global order
**Cost of linearizability:** every Spanner commit waits ~7ms for the TrueTime interval to close. Throughput is thousands of tx/sec per node. DynamoDB (eventual) handles millions. Global order comes at a speed cost.
| System | Model | How it is implemented |
|---|---|---|
| Google Spanner | Linearizable | TrueTime API (atomic clocks + GPS) |
| etcd / ZooKeeper | Linearizable | Raft/ZAB consensus |
| PostgreSQL (single) | Linearizable | Single node - linearizable by construction |
| CockroachDB | Serializable | HLC (Hybrid Logical Clocks) + Raft |
Client A wrote x=1 and received an acknowledgment. Client B started reading after that. What does linearizability guarantee?
Causal and Eventual: when strictness is unnecessary
**Facebook, 2009. 300 million users. Requirement: if a comment is visible, its parent post must also be visible.** This requires causal consistency. But the order of two independent posts from different users is irrelevant. That is the idea: enforce ordering only where there is a causal dependency.
**Causal consistency** - causally related operations are seen by all nodes in the same order. Concurrent (independent) operations may appear in different orders on different nodes.
**Causal consistency is achievable without consensus** - vector clocks or causal tokens are sufficient. This makes it attractive for geo-distributed systems: COPS, Cassandra LWT, MongoDB causal consistency.
Eventual consistency: maximum availability
**Eventual consistency** - if no new writes occur, all replicas will sooner or later converge to the same state. "Sooner or later" is the official definition: no timing guarantees whatsoever.
| Eventual problem | Example | Solution |
|---|---|---|
| **Stale read** | Posted a message, refreshed the page - message is gone | Read-your-writes guarantee (sticky session) |
| **Monotonic violation** | Read balance 100, then 50, then 100 again | Monotonic reads (sticky session or versioning) |
| **Write conflicts** | Two replicas received different values for x | LWW (Last Write Wins) or merge function |
**Session guarantees** - a middle ground: within a single user session additional guarantees are provided (read-your-writes, monotonic reads), across sessions - eventual. Implemented via sticky sessions or session tokens with version numbers.
Eventual consistency means the system can return any garbage data
Eventual consistency guarantees convergence - replicas will reach the same value, just without a time bound
Eventual guarantees convergence: all replicas will reach the same state in the absence of new writes. It does not mean arbitrary data. The problem is stale reads during replication propagation, not random corruption.
In a social network Bob replies to Alice's post. What is the minimum consistency model that guarantees a reader never sees the reply without the original post?
Tunable consistency: configure per use case
**Cassandra, 2010. The insight: do not choose a model once for the entire database - choose per request.** One service writes with QUORUM (strict) and reads with ONE (fast). Another writes with ANY (maximum speed) and reads with QUORUM (strict). This is tunable consistency - Cassandra's key innovation.
The formula: W + R > N
With N replicas, W write acknowledgments and R replicas consulted on read: if W + R > N then the sets of nodes that confirmed the write and the nodes that respond to the read overlap. That overlap guarantees at least one of the R nodes holds the latest data.
| System | Default model | Tunable options |
|---|---|---|
| PostgreSQL (single) | Linearizable | None (single node) |
| CockroachDB | Serializable | Stale reads optionally |
| DynamoDB | Eventual | Strongly consistent reads (2x cost) |
| Cassandra | Eventual (ONE) | ANY/ONE/QUORUM/ALL per request |
| MongoDB | Eventual | Linearizable reads optionally |
| Redis Cluster | Eventual | WAIT N TIMEOUT for sync replication |
Practical choice in e-commerce
An online store uses Cassandra. Product prices: QUORUM write + QUORUM read (a stale price shown to a customer is a reputational risk). View counter: ONE write + ONE read (small inaccuracy acceptable, speed matters). Shopping cart: ONE write + QUORUM read (fast writes, strict reads - the customer must see their own changes). Three different consistency policies in one database for three different business requirements.
**Pitfall with tunable consistency:** mixing levels inconsistently produces subtle bugs. QUORUM write + ONE read does not guarantee freshness: ONE read may hit a replica that has not yet received the write. Document the consistency policy for each table.
QUORUM always means majority (N/2 + 1)
In Cassandra QUORUM = N/2+1, but what matters is the formula W+R > N, not the word 'QUORUM' itself
Strong consistency can be achieved without QUORUM: W=1, R=ALL or W=ALL, R=1 also satisfy W+R > N. 'QUORUM' is just a convenient preset for balanced performance. Understanding the intersection math is what matters.
Cassandra with N=5 replicas, W=3 (write ack), R=2 (read from replicas). Does this guarantee strong consistency?
Вопросы для размышления
- Which operations in a typical production system require linearizable consistency, and which could safely move to eventual? How does the W+R > N formula change the approach to configuration choices?
Связанные уроки
- ds-02-cap-theorem — CAP theorem is the foundation of consistency tradeoffs
- dist-09-raft — Raft provides linearisability as a reference implementation
- ds-10-crdts — CRDTs represent the eventual consistency end of the spectrum
- dist-07-transactions — Isolation levels in transactions mirror consistency models
- db-03-acid