Distributed Systems
Lamport Logical Clocks
Цели урока
- Understand why physical time is unreliable in distributed systems
- Know the happens-before relation and its three rules
- Implement the Lamport clock algorithm (tick + receive)
- Distinguish partial order from total order and know when each is needed
- Understand the limits of Lamport clocks and when vector clocks are required
Предварительные знания
- Basic understanding of processes and inter-process communication
- Familiarity with distributed systems at the ds-01-intro level
- The notion of 'event ordering' in logs
In a single-machine program, two events happen in a clear order: one statement runs before the next. Across machines, that guarantee disappears. Clocks drift, NTP corrects them by jumping backward, and packets arrive out of order. Lamport's 1978 paper showed that physical time is not just inconvenient - it is the wrong abstraction. Causality is what matters, and it can be tracked with nothing more than an integer counter.
- **Cassandra Last Write Wins**: client-supplied timestamps determine the winner of concurrent writes; getting the timestamp source wrong (server clock vs logical clock) silently loses data.
- **CockroachDB**: uses Hybrid Logical Clocks - physical time bounded by uncertainty plus a logical counter - to provide serializable transactions across continents without atomic clocks.
- **Git commit graph**: a vector clock encoded as a DAG of parent pointers; merge conflicts are exactly the case where two branches are concurrent in the happens-before sense.
Historical context
In 1978, Leslie Lamport published 'Time, Clocks, and the Ordering of Events in a Distributed System' in Communications of the ACM. Most engineers at the time assumed physical clocks could coordinate distributed systems. Lamport showed that physical time is fundamentally unreliable across machines and that a simple counter rule - increment on send, take max+1 on receive - produces a causally consistent logical clock. The paper became one of the most cited in computer science, and Lamport received the Turing Award in 2013 partly for this work.
Why physical time fails
In a distributed system, each machine has its own clock. Even with NTP synchronization, clocks can differ by **1-10 milliseconds** and can **jump backward** during corrections. When event ordering matters - for example, two trades executing at 'the same millisecond' on different servers - physical timestamps are not reliable enough to determine which came first.
**Google Spanner's True Time** API is the one exception: Google uses atomic clocks and GPS receivers in each datacenter to keep clock uncertainty below 7ms, then explicitly waits out that uncertainty before committing. This costs latency but buys linearizability. For most systems without atomic clocks, logical clocks are the practical solution.
Synchronize clocks tightly enough (GPS, PTP) and the problem disappears.
Even with atomic clocks (Google Spanner) the uncertainty stays, and systems are designed around that fact by adding a commit-wait.
Google Spanner uses GPS plus atomic clocks with ~7 ms uncertainty, but does not pretend the uncertainty is zero. Every commit waits out the interval before becoming visible. The cost is real: thousands of tx/sec instead of millions. Lamport clocks solve event ordering without paying that price.
Why is NTP insufficient for ordering events in a distributed trading system?
The Happens-Before relation
Lamport defined the **happens-before** relation (written `→`) without relying on physical time. Event `a → b` (a happened before b) holds if: (1) `a` and `b` are in the same process and `a` comes first, (2) `a` is the send of a message and `b` is the receive of that message, or (3) there exists `c` such that `a → c` and `c → b` (transitivity). If neither `a → b` nor `b → a`, the events are **concurrent** (written `a ∥ b`) - neither causally influenced the other.
**Concurrency is not a bug** - it is information. When two events are concurrent (`∥`), it means they genuinely could not have influenced each other. Distributed databases like Cassandra and DynamoDB use this distinction to detect conflicts: concurrent writes to the same key require a resolution strategy (LWW, CRDTs, or user-defined merge).
Process A sends a message (event e1). Process B receives it (event e2). Process C performs a local operation (event e3) with no message exchange with A or B. What is the relationship between e2 and e3?
The Lamport clock algorithm
A **Lamport logical clock** is a counter maintained by each process following two rules: (1) **Tick rule** - before any event (local or send), increment the counter. (2) **Receive rule** - on receiving a message with timestamp `T`, set counter to `max(local, T) + 1`. This guarantees: if `a → b`, then `C(a) < C(b)`. The converse is not guaranteed - a higher timestamp does not mean happened-after.
**Total order via tie-breaking**: Lamport timestamps alone give a partial order. To get a total order (needed for replicated state machines), break ties by process ID. Since process IDs are unique and known to all nodes, every node independently arrives at the same ordering - no coordination required.
Process A (counter=5) receives a message with Lamport timestamp=3. What is the counter after receive?
Guarantees, limits, and what comes next
Lamport clocks provide **causal consistency**: if `a → b` then `C(a) < C(b)`. But they cannot detect concurrency from timestamps alone - two concurrent events may have different timestamps. **Vector clocks** solve this: each process tracks a vector of counters, one per process, enabling detection of both causality and concurrency. Lamport clocks are simpler but less expressive.
**Real-world usage**: Cassandra uses Lamport timestamps (client-provided) for LWW conflict resolution. Riak uses vector clocks to detect concurrent writes. DynamoDB uses a hybrid approach. Git uses a DAG (directed acyclic graph) of commits - effectively a vector clock encoded as a graph. CockroachDB uses Hybrid Logical Clocks to combine physical clock proximity with logical ordering guarantees.
Lamport clocks solve the problem of picking a winner among conflicting writes.
Lamport clocks give total ordering, not causal ordering: concurrent events receive an artificial order, not a real one.
Cassandra uses Lamport timestamps for LWW and documents this as a known limitation: concurrent writes can silently lose data. Amazon Dynamo chose vector clocks for its shopping cart precisely because LWW with Lamport timestamps was unacceptable. A customer would lose items added to the cart when two devices wrote concurrently.
A system uses Lamport timestamps to determine the 'last write wins' winner on conflict. Node X writes with ts=42, Node Y writes with ts=43. The write with ts=43 wins. Is this correct?
Key ideas
- **Physical time fails** in distributed systems: NTP drift is 1-10 ms, clocks can jump backward, and network delays are unbounded.
- **Happens-before** (a → b) is defined by program order, message send/receive, and transitivity - independent of wall-clock time.
- **Lamport clocks**: tick on every event, take max+1 on receive. Guarantees a → b implies C(a) < C(b); the converse is not guaranteed.
- **Total order via tie-breaking**: append process ID to break Lamport timestamp ties so every node agrees on the same ordering.
- **Vector clocks** detect concurrency that Lamport clocks cannot; Hybrid Logical Clocks combine wall-clock proximity with logical guarantees.
Вопросы для размышления
- Cassandra's LWW with Lamport timestamps silently drops one of two concurrent writes. When is that acceptable, and when must a system use vector clocks or CRDTs to preserve both writes?
- Google Spanner achieves linearizability with TrueTime by waiting out clock uncertainty. What workloads can absorb that 7 ms commit-wait, and which cannot?
- Vector clocks cost O(N) space per timestamp, where N is the number of processes. How do production systems (Riak, Voldemort) bound this growth in long-running clusters?
Связанные уроки
- ds-04-consistent-hashing — Cassandra uses Lamport timestamps for Last Write Wins conflict resolution across hash ring nodes
- sd-04-database — Distributed databases rely on logical clocks to order concurrent writes without a global coordinator
- bt-04-dns-tls — Both DNS TTL and Lamport clocks solve the same problem: agreeing on a consistent view without central control
- se-04 — Happens-before relation maps to LSP: substituting a causally later event must not violate the ordering contract
- ds-03-consensus — Logical clocks are a prerequisite for understanding Paxos and Raft consensus algorithms
- ds-01-intro
- ds-05-replication
- isd-14-consistency-models
- alg-18-topological