Distributed Systems

Vector Clocks: Precise Event Ordering

Цели урока

  • Understand the limitation of Lamport Timestamps: partial order without concurrency detection
  • Know the three Vector Clock rules: tick, send, receive
  • Compare vectors to determine causality vs concurrency
  • Distinguish Vector Clocks, HLC, and TrueTime by their use cases

Предварительные знания

  • Lamport Timestamps and the happened-before relation
  • Replication and eventual consistency basics
  • Basic data structures (Map, array)

Every time an item is added to an Amazon cart from a phone and a laptop simultaneously, a Vector Clock ensures neither item is silently lost.

  • **Amazon Dynamo (2007)** - Vector Clocks for conflict detection during shopping cart replication
  • **Riak** - Dotted Version Vectors (evolution of Vector Clocks) for clusters with thousands of clients
  • **CockroachDB / YugabyteDB** - HLC for globally consistent SQL transactions without atomic clocks
  • **Apache Cassandra** - LWW as a simpler alternative, cost: possible data loss on concurrent writes
  • **Google Spanner** - TrueTime as an alternative via atomic clocks and GPS receivers

Colin Fidge and Friedemann Mattern: parallel discovery in 1988

In 1988 two researchers independently published the same idea: Colin Fidge in January (University of Queensland) and Friedemann Mattern in September (Darmstadt). Fidge in 'Timestamps in Message-Passing Systems That Preserve the Partial Ordering', Mattern in 'Virtual Time and Global States of Distributed Systems'. Both cited Lamport 1978 as the starting point. Today the structure is called Vector Clocks or Fidge/Mattern timestamps.

Why Lamport Timestamps Fall Short

**Amazon S3, 2006. First launch. Engineers discover: two replicas of the same object hold different versions - both timestamped 14:32:00.412. NTP precision was 100 ms. Which replica wrote first - the system has no idea.** This sparked the demand for Vector Clocks in Amazon Dynamo - the mechanism now tracking the shopping carts of 200 million customers.

Lamport Timestamps (1978) - see the logical clocks lesson - provide a partial order: if a causally precedes b, then L(a) < L(b). The converse does NOT hold: L(a) < L(b) does not imply a happened before b. Two independent events on separate nodes can receive identical timestamps.

PropertyLamport TimestampsVector Clocks
Data size1 integerN integers (one per node)
Is a causally before b?L(a) < L(b) - necessary but not sufficientVC(a) < VC(b) - exact criterion
Detect concurrencyNo - cannot distinguish from causalityYes - a || b when VCs are incomparable
Use caseTotal ordering (with tiebreaking rules)Conflict detection and causality tracking

Lamport's implication is one-directional: a -> b implies L(a) < L(b), but L(a) < L(b) does NOT imply a -> b. Vector Clocks give a bidirectional criterion: a -> b if and only if VC(a) < VC(b).

Lamport Timestamps totally order all events in a distributed system

Lamport Timestamps give only a partial order: a -> b implies L(a) < L(b), but not the other way around

The Lamport counter grows monotonically on message receipt, but two independent nodes can reach identical values with no causal link. Vector Clocks store N counters and provide a bidirectional exact criterion for causality.

Two events on different nodes both have Lamport Timestamp L=7. What can be concluded?

How Vector Clocks Work

**Colin Fidge and Friedemann Mattern (1988, independently): each node maintains a vector of counters - one slot per node. Each node increments its own slot on every event, attaches the full vector to outgoing messages, and on receipt takes the component-wise maximum.** Three rules that capture full causal information.

Three nodes, three events

P1=[1,0,0] sends msg to P2. P2=[0,1,0] receives: merge([1,0,0],[0,1,0])=[1,1,0], increment -> P2=[1,2,0]. P3=[0,0,1] does a local event -> P3=[0,0,2]. Now: P2=[1,2,0] and P3=[0,0,2] are incomparable - their events are concurrent. P1=[1,0,0] and P2=[1,2,0]: P1 is causally before P2 (1<=1, 0<=2, 0<=0 with at least one strict inequality).

Vector Clock size is O(N) where N is the number of nodes. In Amazon Dynamo N=9 (standard ring). For systems with thousands of nodes, Dotted Version Vectors or other compact representations are used instead.

Node P2 receives a message with VC=[3,0,2]. P2 currently has VC=[1,4,1]. What will P2's VC be after receiving?

Comparing Vectors and Detecting Conflicts

**Amazon DynamoDB, 2007: a user adds an item to the cart from two devices simultaneously - which version wins? DynamoDB uses Vector Clocks for a precise answer: if the vectors are concurrent, both versions are returned to the client for manual merge. This is exactly how amazon.com shopping cart works.**

VC(a)VC(b)ResultMeaning
[1,0,0][2,1,0]before (a->b)a is causally before b; b has seen a's event
[2,1,0][1,0,0]after (b->a)b is causally after a
[1,2,0][1,0,3]concurrent (||)conflict - both versions need merge
[2,1,3][2,1,3]equalidentical state on both nodes

Conflict in Amazon shopping cart

User adds a book on mobile: VC=[mobile:1, web:0]. Simultaneously adds headphones on the web: VC=[mobile:0, web:1]. Server receives both. compareVC([1,0],[0,1]) = 'concurrent'. Dynamo does not pick a winner - returns both versions to the client marked 'conflict'. The shopping cart service takes the union: cart contains both book and headphones. Resulting VC=[mobile:1, web:2].

Last-Write-Wins (LWW) is simpler than Vector Clocks - pick the version with the higher timestamp. But LWW silently drops data on concurrent writes. Vector Clocks detect the conflict and preserve all versions for merge. Cassandra defaults to LWW. Riak and Dynamo use Vector Clocks. Conflicts in replicas can also be resolved via CRDTs.

Vector Clocks automatically resolve conflicts on concurrent writes

Vector Clocks detect conflicts and preserve all versions - resolution is left to the application layer or CRDTs

Vector Clocks are a causality detection tool, not conflict resolution. Once concurrent versions are identified, the system must either return both to the client (Dynamo), use CRDTs for automatic merge (Riak with CRDT types), or apply LWW as a fallback.

A server receives two versions of an object: VC1=[3,1,0] and VC2=[2,0,2]. What should the system do?

Hybrid Logical Clocks and Real-World Usage

**CockroachDB, 2015. Goal: globally distributed SQL with serializable isolation. Problem: Vector Clocks scale poorly for SQL (global transactions need total order), physical clocks drift 100-500 ms. Solution: Hybrid Logical Clocks (HLC) - physical time as the base, logical counter for collision resolution. CockroachDB uses HLC in every transaction since 2015.**

MechanismSizeAccuracyUsed by
Lamport Clock1 intPartial order (one-directional)Total event ordering with tiebreaking
Vector ClockN intsExact causality + concurrencyConflict detection (Dynamo, Riak, Voldemort)
HLC2 intsCausality + closeness to physical timeGlobal transactions (CockroachDB, YugabyteDB)
TrueTime (Spanner)Interval [earliest, latest]Strong global consistencyGoogle Spanner (atomic clocks + GPS)

Dotted Version Vectors address the Vector Clock growth problem with many clients (used in multi-master replication): instead of a counter per node, only the pair (node, counter) for the latest write is stored. Riak 2.0 switched to DVV in 2014, reducing metadata size by 10x on large clusters.

Why does CockroachDB use HLC instead of pure Vector Clocks?

Вопросы для размышления

  • A system stores data on 100 nodes. A Vector Clock of 100 integers per object - is this acceptable overhead? At what node count do Vector Clocks become impractical, and what alternatives are worth considering?

Связанные уроки

  • dist-05-vector-clocks — Builds on basic vector clocks
  • dist-11-replication — Version vectors enable multi-master conflict detection
  • ds-10-crdts — Dotted version vectors track CRDT state precisely
  • alg-18-topological
Vector Clocks: Precise Event Ordering

0

1

Sign In