Distributed Systems
Logical Clocks and Event Ordering
Цели урока
- Understand why physical clocks (NTP) are insufficient for event ordering
- Explain the happened-before relation and its three rules
- Implement Lamport timestamps and understand their limitation
- Use Vector Clocks to detect concurrent events
- Distinguish Lamport, Vector Clocks, and HLC by use case
Предварительные знания
- Basic understanding of distributed systems (what a node and message are)
- CAP theorem: difference between CP and AP systems
- Understanding of data replication
NTP is accurate to 100ms. A database transaction takes microseconds. Two AWS servers in the same AZ can drift enough that the order of events becomes fundamentally undeterminable without logical clocks. See clocks basics and replication.
- **Amazon Dynamo (2007)** - vector clocks for conflict detection in shopping cart with eventual consistency
- **Apache Cassandra** - Lamport timestamps for ordering operations within cells during concurrent writes
- **CockroachDB** - Hybrid Logical Clocks in every transaction for global ordering without atomic clocks
- **Riak and CouchDB** - vector clocks for multi-master replication without a coordinator
- **Google Spanner TrueTime** - atomic clocks + GPS + 7ms commit-wait as an alternative to logical clocks at hardware cost
Lamport 1978: 8 pages that changed Computer Science
The paper "Time, Clocks, and the Ordering of Events in a Distributed System" was published in Communications of the ACM in 1978. Lamport was working at SRI International on program verification when he formulated a simple observation: in a distributed system there is no way to say that "event A happened before event B" by physical time - but causal order can be determined. This idea, initially seeming abstract, became the foundation for consensus algorithms, replication, and distributed databases. Turing Award 2013.
The Time Problem in Distributed Systems
**Amazon DynamoDB, 2012: a conflict in a shopping cart. A user added an item simultaneously from mobile and browser. Both servers accepted the write. Which timestamp is larger - unknown: clocks drifted 80ms apart via NTP. Dynamo solved this not by synchronizing clocks, but with vector clocks - the approach used across the modern internet.**
NTP (Network Time Protocol) synchronizes clocks with ~1-100ms precision depending on the network. For events within a single datacenter (microseconds), this is unacceptable. Clocks on different servers inevitably drift: CPU frequencies vary, temperature affects crystals, virtualization adds jitter.
In 1978 Leslie Lamport proposed a different approach: instead of synchronizing clocks, introduce **causal ordering**. If event A causes event B (A sent a message that B received), then A precedes B by definition. Not by time - by causality.
| Method | Precision | Problem |
|---|---|---|
| NTP | 1-100ms | Too coarse for microsecond events |
| GPS/atomic clocks | ~7ms interval (Spanner) | Expensive, not universally available |
| Lamport timestamps | Partial order | Concurrent events indistinguishable |
| Vector clocks | Full causal order | Size grows with number of nodes |
With good clock synchronization (GPS, atomic clocks), the ordering problem is solved
Even perfect synchronization does not give precise ordering of concurrent events - causal ordering is needed
Google Spanner uses atomic clocks and GPS but still must commit-wait ~7ms. Physical time is a probabilistic tool; logical clocks are deterministic.
Why does NTP synchronization not solve the event ordering problem in distributed systems?
Lamport Timestamps and the Happened-Before Relation
**Lamport's 1978 paper "Time, Clocks, and the Ordering of Events in a Distributed System" - 8 pages that created the field of distributed computing. Over 15,000 citations. The idea: introduce logical clocks as a monotonically increasing counter tied to messages between processes.**
The Happened-Before Relation (--->)
- **Local order:** if a and b are events in the same process, and a is before b, then a ---> b
- **Messages:** if a is sending a message and b is receiving that same message, then a ---> b
- **Transitivity:** if a ---> b and b ---> c, then a ---> c
If neither a ---> b nor b ---> a - the events are **concurrent** (a || b). This does not mean "simultaneous" - it means there is no causal connection between them. Concurrent events can occur in any order without affecting system correctness.
Lamport Clock Rules
- Each process maintains a counter L, starting at 0
- On a local event: L = L + 1
- On sending a message: L = L + 1, send L with the message
- On receiving a message with timestamp T: L = max(L, T) + 1
**Critical limitation of Lamport timestamps:** if L(a) < L(b), this does NOT mean a ---> b. Only if a ---> b is it guaranteed that L(a) < L(b). The relation is one-directional. This makes Lamport timestamps unsuitable for detecting concurrent writes.
P1 has Lamport counter = 3. It receives a message with timestamp = 7. What is P1's new counter?
Vector Clocks: full causal order
**Amazon Dynamo, 2007: the shopping cart is an AP system (availability over consistency). On conflict Dynamo does not pick a winner - it shows both versions to the user and asks to resolve. Vector Clocks are used to detect conflicts. Without them Dynamo could not distinguish a conflict from a normal sequential write.**
A Vector Clock is an array of counters, one per process in the system. Instead of a single number (Lamport), each process stores a vector `[L1, L2, L3, ...]` where Li is how many events occurred at the i-th process from the current node's perspective.
Example: conflict detection during replication
| Operation | Vector Clock | Status |
|---|---|---|
| A: write(x=1) | [A:1, B:0] | OK - first write |
| B: write(x=2) independently | [A:0, B:1] | OK - but concurrent with A |
| A syncs with B | A sees [A:0, B:1] | CONFLICT: [A:1,B:0] || [A:0,B:1] |
| Merge + resolve | [A:2, B:1] | x = merge(1, 2) - application decides |
Dynamo showed conflicting cart versions to the user - showing all variants to choose from. This is called "shopping cart semantics": better to show an extra item than to lose one. Riak and CouchDB use the same approach - the same trick underpins CRDTs.
Vector clocks automatically resolve conflicts
Vector clocks only detect conflicts - resolution is the application's responsibility
Dynamo, upon detecting [A:1,B:0] || [A:0,B:1], does not know which value is correct. It returns both versions to the client. Resolution depends on data semantics: for a cart - union, for a counter - sum, for a name - ask the user.
Vector clock A = [A:2, B:1], Vector clock B = [A:1, B:3]. What is their order?
Hybrid Logical Clocks and practical applications
**CockroachDB, 2015: the goal was to build an open-source Spanner equivalent without atomic clocks. The solution - Hybrid Logical Clocks (HLC). The idea: combine physical time (NTP) with a logical counter. Physical time provides anchoring to reality, the counter guarantees strict order when timestamps match. CockroachDB runs in production at thousands of companies - HLC is at the core of every transaction.**
HLC stores a pair `(l, c)`, where `l` is the maximum observed physical time and `c` is a counter for resolving collisions when `l` is equal. This allows HLC to be read as ordinary time (for debugging, monitoring) while maintaining strict ordering like a Lamport clock.
Comparing approaches
| Mechanism | Order | Human-readable | Size | Used by |
|---|---|---|---|---|
| Lamport | Partial (causal) | No (just a counter) | O(1) | Basic algorithms, Cassandra |
| Vector Clock | Full (causal + concurrent) | No | O(N nodes) | Dynamo, Riak, CouchDB |
| HLC | Partial + physically anchored | Yes (reads as time) | O(1) | CockroachDB, YugabyteDB |
| TrueTime (Spanner) | Global strong | Yes | O(1) | Google Spanner |
Vector Clocks have a scaling problem: the vector grows with the number of nodes. With 1,000 nodes, every message carries 1,000 counters. Dynamo addressed this through version pruning - discarding old entries in the vector by time or LRU. For the full survey see event ordering.
What is the main practical advantage of HLC over pure Lamport timestamps?
Вопросы для размышления
- Dynamo shows the user conflicting cart versions when vector clocks reveal concurrent writes. Which data in real applications can be safely resolved automatically (without user intervention), and which cannot - and why?
Связанные уроки
- ds-04-clocks — Logical clocks are the basis for vector clocks
- dist-06-ordering — Vector clocks implement causal ordering
- ds-08-vector-clocks — Advanced patterns build on this introduction
- ds-10-crdts — CRDTs use vector clocks for conflict detection
- alg-18-topological