Distributed Systems

Consistency Models

Цели урока

Distinguish linearizable, sequential, causal and eventual consistency
Apply the W+R > N formula to choose levels in Cassandra/DynamoDB
Explain the difference between C in CAP and C in ACID
Select a consistency model based on business requirements

Предварительные знания

CAP theorem: understanding the Consistency vs Availability trade-off
Replication: leader/follower and multi-leader approaches
Basic understanding of vector clocks (for causal consistency)

DynamoDB shows a cart with 3 items while another replica already holds 4 - and that is the correct decision for a service targeting 99.999% uptime. Google Spanner stores financial data with linearizable consistency backed by atomic clocks in every datacenter. One letter C in CAP hides five different levels of strictness.

**Amazon DynamoDB** - eventual by default, strongly consistent reads at 2x cost: a deliberate availability vs cost trade-off
**Google Spanner** - the only production-grade linearizable globally distributed database, atomic clocks in every region
**Cassandra** - tunable consistency per request: different tables with different policies in one cluster
**Facebook TAO** - causal consistency for the social graph: a reply is only visible if the original post is visible
**Redis Cluster** - eventual by default, WAIT command for synchronous replication when needed

Werner Vogels and eventually consistent

In 2008 Amazon CTO Werner Vogels published 'Eventually Consistent' in ACM Queue. He systematized consistency models from the experience of building DynamoDB and S3. The core message: eventual consistency is not a weakness but a deliberate engineering choice for scalability. The article changed industry discourse - engineers stopped apologizing for eventual consistency and began choosing it explicitly.

The Consistency Spectrum

**Amazon DynamoDB, 2012. Prime Day. Hundreds of millions of requests per second. A shopper's cart shows 3 items - while another replica holds 4.** The shopper notices nothing: within a second both replicas converge. This is not a bug - it is a deliberate choice of consistency model in exchange for 99.999% availability.

**Term confusion:** C in CAP is NOT the same as C in ACID. CAP-consistency = linearizability (all nodes see the same data). ACID-consistency = business invariants (a balance cannot go negative). Two different concepts, one letter.

A **consistency model** is a contract between the system and its clients: what is guaranteed when reading data after a write. From the strictest to the weakest - this is not a binary choice but a continuous spectrum.

Model	Guarantee	Cost
Linearizable	Operations appear instantaneous, real-time order	Consensus, high latency
Sequential	Single global order for all nodes, not necessarily real-time	Coordination, medium latency
Causal	Causally related operations are ordered	Vector clocks, no consensus needed
Read-your-writes	Own write visible immediately within a session	Sticky sessions or version tokens
Eventual	All replicas converge eventually	Stale reads possible, conflict resolution required

**Practical rule:** the stronger the model, the higher the latency and the lower the availability under partition. Business requirements drive the choice: financial data needs linearizable, a shopping cart can tolerate eventual.

Consistency is a binary choice: either present or absent

Consistency is a spectrum of at least 5 levels, each with a different latency/availability trade-off

In practice a single system can use different levels for different operations. Cassandra allows choosing a consistency level per request. DynamoDB offers strongly consistent reads as an option at double the latency cost.

How does C in CAP differ from C in ACID?

Linearizability: operations as if on one machine

**Google Spanner, 2012. The first globally distributed database with linearizable consistency.** A transaction committed in an Iowa datacenter is visible to any node in Tokyo or London within a few milliseconds. The price: atomic clocks and GPS receivers in every datacenter.

**Linearizability** is the strictest point on the consistency spectrum. Each operation appears to take effect instantaneously at some point between its invocation and completion. After that point all readers see the result.

How Spanner achieves global order

**Cost of linearizability:** every Spanner commit waits ~7ms for the TrueTime interval to close. Throughput is thousands of tx/sec per node. DynamoDB (eventual) handles millions. Global order comes at a speed cost.

System	Model	How it is implemented
Google Spanner	Linearizable	TrueTime API (atomic clocks + GPS)
etcd / ZooKeeper	Linearizable	Raft/ZAB consensus
PostgreSQL (single)	Linearizable	Single node - linearizable by construction
CockroachDB	Serializable	HLC (Hybrid Logical Clocks) + Raft

Client A wrote x=1 and received an acknowledgment. Client B started reading after that. What does linearizability guarantee?

Causal and Eventual: when strictness is unnecessary

**Facebook, 2009. 300 million users. Requirement: if a comment is visible, its parent post must also be visible.** This requires causal consistency. But the order of two independent posts from different users is irrelevant. That is the idea: enforce ordering only where there is a causal dependency.

**Causal consistency** - causally related operations are seen by all nodes in the same order. Concurrent (independent) operations may appear in different orders on different nodes.

**Causal consistency is achievable without consensus** - vector clocks or causal tokens are sufficient. This makes it attractive for geo-distributed systems: COPS, Cassandra LWT, MongoDB causal consistency.

Eventual consistency: maximum availability

**Eventual consistency** - if no new writes occur, all replicas will sooner or later converge to the same state. "Sooner or later" is the official definition: no timing guarantees whatsoever.

Eventual problem	Example	Solution
Stale read	Posted a message, refreshed the page - message is gone	Read-your-writes guarantee (sticky session)
Monotonic violation	Read balance 100, then 50, then 100 again	Monotonic reads (sticky session or versioning)
Write conflicts	Two replicas received different values for x	LWW (Last Write Wins) or merge function

**Session guarantees** - a middle ground: within a single user session additional guarantees are provided (read-your-writes, monotonic reads), across sessions - eventual. Implemented via sticky sessions or session tokens with version numbers.

Eventual consistency means the system can return any garbage data

Eventual consistency guarantees convergence - replicas will reach the same value, just without a time bound

Eventual guarantees convergence: all replicas will reach the same state in the absence of new writes. It does not mean arbitrary data. The problem is stale reads during replication propagation, not random corruption.

In a social network Bob replies to Alice's post. What is the minimum consistency model that guarantees a reader never sees the reply without the original post?

Tunable consistency: configure per use case

**Cassandra, 2010. The insight: do not choose a model once for the entire database - choose per request.** One service writes with QUORUM (strict) and reads with ONE (fast). Another writes with ANY (maximum speed) and reads with QUORUM (strict). This is tunable consistency - Cassandra's key innovation.

The formula: W + R > N

With N replicas, W write acknowledgments and R replicas consulted on read: if W + R > N then the sets of nodes that confirmed the write and the nodes that respond to the read overlap. That overlap guarantees at least one of the R nodes holds the latest data.

System	Default model	Tunable options
PostgreSQL (single)	Linearizable	None (single node)
CockroachDB	Serializable	Stale reads optionally
DynamoDB	Eventual	Strongly consistent reads (2x cost)
Cassandra	Eventual (ONE)	ANY/ONE/QUORUM/ALL per request
MongoDB	Eventual	Linearizable reads optionally
Redis Cluster	Eventual	WAIT N TIMEOUT for sync replication

Practical choice in e-commerce

An online store uses Cassandra. Product prices: QUORUM write + QUORUM read (a stale price shown to a customer is a reputational risk). View counter: ONE write + ONE read (small inaccuracy acceptable, speed matters). Shopping cart: ONE write + QUORUM read (fast writes, strict reads - the customer must see their own changes). Three different consistency policies in one database for three different business requirements.

**Pitfall with tunable consistency:** mixing levels inconsistently produces subtle bugs. QUORUM write + ONE read does not guarantee freshness: ONE read may hit a replica that has not yet received the write. Document the consistency policy for each table.

QUORUM always means majority (N/2 + 1)

In Cassandra QUORUM = N/2+1, but what matters is the formula W+R > N, not the word 'QUORUM' itself

Strong consistency can be achieved without QUORUM: W=1, R=ALL or W=ALL, R=1 also satisfy W+R > N. 'QUORUM' is just a convenient preset for balanced performance. Understanding the intersection math is what matters.

Cassandra with N=5 replicas, W=3 (write ack), R=2 (read from replicas). Does this guarantee strong consistency?

Вопросы для размышления

Which operations in a typical production system require linearizable consistency, and which could safely move to eventual? How does the W+R > N formula change the approach to configuration choices?

Связанные уроки

ds-02-cap-theorem — CAP theorem is the foundation of consistency tradeoffs
dist-09-raft — Raft provides linearisability as a reference implementation
ds-10-crdts — CRDTs represent the eventual consistency end of the spectrum
dist-07-transactions — Isolation levels in transactions mirror consistency models
db-03-acid