Distributed Systems
Event Ordering
Цели урока
- Distinguish between partial, causal, and total order and know what each guarantees
- Understand why physical clocks are insufficient for determining event order
- Know the mechanisms for implementing causal order (vector clocks) and total order (consensus, sequencer)
- Choose the minimum sufficient ordering level for a given problem
Предварительные знания
- Understanding the CAP theorem and the consistency vs availability trade-off
- Basic knowledge of Lamport timestamps and the happens-before relation
- Familiarity with vector clocks at the conceptual level
Amazon S3, 2008: two servers simultaneously received different versions of an object. Whose data is correct? Without a formal ordering model, the question has no answer. See vector clocks and CRDTs.
- **Kafka** guarantees total order within a partition - this is why event sourcing and CQRS work reliably
- **Google Spanner** uses GPS and atomic clocks in every datacenter to provide global total order without a centralized sequencer
- **CRDTs** (Conflict-free Replicated Data Types) operate on causal order - enabling Google Docs to work without locks
- **ZooKeeper** uses Zab (ZooKeeper Atomic Broadcast) for total order - coordinating Hadoop clusters (Kafka moved to KRaft in 4.0, March 2025)
- **DynamoDB** deliberately chose eventual consistency for the shopping cart - 99.99% uptime matters more than perfect operation ordering
Leslie Lamport and happens-before
In his 1978 paper "Time, Clocks, and the Ordering of Events in a Distributed System", Lamport introduced the happens-before relation (->) - the foundation of all modern ordering models. The insight: instead of physical time (which is unreliable), use causal connections between events. The paper has over 15,000 citations and became the basis for vector clocks, causal consistency, and CRDTs.
Partial Order: not everything is comparable
**Amazon S3, 2008. Two users simultaneously edit the same document on different servers. Server A received version 1 at 12:00:00.001, server B received version 2 at 12:00:00.002. But whose clock is more accurate? Server clocks differ by 50 milliseconds. The conflict is unresolvable without additional agreement.** This is the core of the ordering problem: concurrent events in a distributed system have no objective order.
Partial order is a formal model: some pairs of events are comparable (one happened before the other), and some are not. Incomparable events are called **concurrent** (independent, parallel).
| Event pair | Relation | Basis |
|---|---|---|
| write(x=1) on A, then read(x) on A | a -> b (a precedes b) | Local order on a single node |
| write(x=1) on A, then read(x) on B | a -> b (causal) | A sent a message to B with this value |
| write(x=3) on A and write(y=2) on B | a || b (concurrent) | No connection between events |
| read(x) on A and read(x) on B | a || b (concurrent) | Independent operations |
Partial order does not say which of two concurrent events happened first. This is not a bug - it is a fundamental property: in a distributed system, that concept does not objectively exist.
Physical server clocks can be synchronized precisely enough to determine event order
NTP gives accuracy of 1-100 ms, PTP gives a few ms. For sub-millisecond operations this is not enough, and clocks drift over time
Google Spanner solved this with GPS receivers and atomic clocks in every datacenter, achieving 7 ms uncertainty. But that costs millions of dollars in infrastructure. Ordinary systems cannot afford such precision.
Node A wrote x=1, then sent a message to Node B. Node B read x. Node C wrote y=2 independently. Which is a concurrent pair?
Causal Order: preserve causality
Causal order is a guarantee: if event A influenced event B (directly or through a chain), then every node in the system will see A before B. Concurrent events (with no causal link) may be seen in any order.
Chat without causal order - the classic problem
Alice writes: "Who wants pizza?" Bob sees the question, replies: "I do!" Charlie sees Bob's reply BEFORE Alice's question: Charlie sees: "I do!" ... "Who wants pizza?" Result: the conversation loses all meaning. With causal order: any node that received Bob's reply has already seen Alice's question - because Bob's reply causally depends on the question.
Vector Clocks are the primary tool for implementing causal order. Each node maintains a vector of counters `[t1, t2, ..., tN]` - one per node. When sending a message, the node increments its own counter and attaches the full vector. When receiving, it takes the element-wise maximum.
Causal order does not order all events - only causally linked ones. Bob's and Charlie's replies to Alice's question are ordered relative to the question, but not relative to each other. This is the trade-off: less coordination than total order, but stronger guarantees than partial order.
Alice (node A) wrote a post. Bob (node B) read it and liked it. Charlie (node C) liked it independently, without seeing Alice's post. What does causal order guarantee?
Total Order and FIFO: everyone sees the same thing
**Total order** is the strongest guarantee: a single global sequence of all events exists, and every node sees it identically. Without total order, three nodes receiving writes `x=1`, `x=2`, `x=3` in different orders will reach different final values. Implemented via Paxos or Raft.
| Method | Mechanism | Cost |
|---|---|---|
| Single Leader (sequencer) | One node numbers all operations | SPOF, limited throughput |
| Lamport + Process ID | Timestamp + node ID as tiebreaker | Artificial order, does not reflect real time |
| Consensus (Paxos/Raft) | Nodes agree on every write | High latency (2 round-trips minimum) |
| Atomic Broadcast (ZAB) | Protocol guarantees identical order on all nodes | Implementation complexity, latency same as consensus |
**FIFO order** is a weaker but often sufficient guarantee: messages from one sender arrive in sending order. FIFO does not order messages from different senders relative to each other.
Total order requires coordination on every write. Kafka solves this elegantly: total order is guaranteed within a single partition (one leader), but not across partitions. This allows horizontal scaling at the cost of global order.
Total order is always the best choice since it is the strongest guarantee
Total order is expensive: every write requires a coordination round-trip. Always choose the minimum sufficient level
DynamoDB processes millions of requests per second with eventual consistency. With total order this would be physically impossible - every operation would wait for 2+ round-trips from all nodes. Amazon deliberately chose AP for the shopping cart: a missing item in the cart is a smaller problem than a store outage.
Three nodes simultaneously write different values to x. Without an additional protocol, nodes can end up with different final values. Which guarantee resolves this?
Choosing an ordering level in real systems
Real systems do not choose between ordering levels once and for all - they apply different levels for different operations. Kafka provides total order within a partition but eventual consistency across partitions. DynamoDB allows selecting a consistency level per request. See also clocks foundations.
| System | Ordering | Why this choice |
|---|---|---|
| Kafka (within partition) | Total (single leader) | Log-driven architecture requires strict order for replay |
| DynamoDB | Eventual (partial) | AP system, LWW for conflict resolution |
| Google Spanner | Total (TrueTime) | GPS and atomic clocks give global order without a centralized sequencer |
| Redis CRDT (CRDB) | Causal | Conflict-free merge without coordination |
| ZooKeeper | Total (Zab) | Cluster coordination requires strong consistency |
| Cassandra | Tunable (partial to serial) | Flexibility: per-query choice from eventual to linearizable |
Practical rule: start with the weakest ordering that solves the problem. Move to a stricter level only when the weaker one does not address a specific issue (conflicts are unresolvable, the user sees an incorrect state).
How to choose ordering for a new feature
Post likes: - Is a conflict (double like) possible? -> no, idempotent -> eventual OK Thread comments: - Does order matter? -> yes, readers must see replies after questions - Causal order is sufficient -> vector clocks or causal broadcast Account balances: - Can a conflict be resolved automatically? -> no - Total order is required -> consensus or single leader
Guarantee hierarchy: FIFO - Causal - Total. Each level is stricter and more expensive than the previous one. Total order always includes causal order. Causal order always includes FIFO.
A movie rating service (like IMDB). Users rate films 1-10. The system averages ratings for display. What is the minimum sufficient ordering?
Вопросы для размышления
- In a messenger like WhatsApp, messages sometimes arrive out of order. What is the minimum ordering level needed for conversations to always read logically - and what trade-off does that require?
Связанные уроки
- dist-05-vector-clocks — Vector clocks provide causal order tracking
- ds-04-clocks — Logical clocks underpin all ordering guarantees
- dist-12-consistency — Message ordering directly determines consistency model
- dist-09-raft — Raft implements total order broadcast via the log
- alg-18-topological