Distributed Systems
CRDT - Conflict-free Replicated Data Types
Цели урока
- Understand why ordinary data structures fail under concurrent updates
- Implement G-Counter, PN-Counter, LWW-Register, OR-Set
- Distinguish CvRDT from CmRDT and choose the right type
- Know real systems that use CRDTs (Redis Enterprise, Notion (Yjs), Linear (Automerge), Apple Notes)
Предварительные знания
- Eventual consistency and CAP theorem (lesson ds-02-cap-theorem)
- Logical clocks and Lamport timestamps (lesson ds-04-clocks)
- Basic understanding of replication and multi-master conflicts
Notion lets entire teams edit one page simultaneously without conflicts. Apple Notes syncs changes between iPhone and Mac without a server arbiter. Not magic - mathematics of CRDTs over AP systems.
- **Redis Enterprise** - G-Counter and PN-Counter for Active-Active replication between data centers on different continents, zero write latency, no locks
- **Notion (Yjs)** - 100M+ users, ships Yjs CRDT in production for collaborative page editing
- **Linear (Automerge)** - mobile-first project tracker, powers offline-first sync with Automerge
- **Apple iCloud / Notes** - CvRDT-based sync between devices offline and online
- **Yjs ecosystem** - JupyterLab, Zed, Logseq, Anytype all build on Yjs CRDTs for collaborative editing
- **Amazon Shopping Cart** - historically one of the first examples of AP semantics similar to CRDT (Dynamo Paper 2007)
Birth of CRDT: Shapiro and Preguica, 2011
The term CRDT appeared in a paper by Marc Shapiro and Nuno Preguica (INRIA, 2011): 'Conflict-free Replicated Data Types'. The authors formalized the conditions under which a data structure guarantees eventual consistency without coordination. Before that, similar ideas existed in Dynamo (Amazon, 2007) and Bayou (Xerox PARC, 1994) but without a unified mathematical foundation. Today CRDTs underpin collaborative tools used by hundreds of millions of people every day.
The problem of concurrent updates
**Notion, 2024. Two editors edit the same page simultaneously: one adds a block at the top, the other at the bottom. Both are offline in a coffee shop with flaky Wi-Fi. A minute later both sessions reconnect 3 ms apart. Whose edits survive?** The traditional answer - last writer wins - means one editor loses their work. Notion ships Yjs CRDT - a structure for replication: data structures that merge without conflicts by mathematical construction.
**CRDT (Conflict-free Replicated Data Type)** is a data structure with a guarantee: any two nodes that have received the same set of operations (in any order) will converge to the same state - without a coordinating server, without locks.
An ordinary counter breaks under concurrent updates: node A reads 5, adds 1, writes 6. Node B reads the same 5, adds 1, writes 6. After synchronization the counter is 6 instead of 7. One increment is lost.
Result without CRDT: counter equals 6, even though two increments were performed from 5. One operation is lost permanently.
| Conflict strategy | Example | Problem |
|---|---|---|
| Last-Writer-Wins | DynamoDB without vector clocks | Data loss on concurrent writes |
| Serializable transactions | PostgreSQL SERIALIZABLE | High latency, requires coordination |
| Manual conflict resolution | Git merge conflicts | Requires human involvement |
| CRDT | Notion (Yjs), Automerge, Redis Enterprise | No loss, no coordination |
CRDTs are only needed for offline-first applications
CRDTs solve concurrent update problems in any AP system where nodes write independently
Redis Enterprise uses CRDTs for Active-Active geo-replication between data centers with zero write latency. The application is not offline - the nearest data center simply does not wait for confirmation from remote replicas.
Two nodes both read counter = 10. Both increment by 1 and save. After synchronization without CRDT the value is:
G-Counter and PN-Counter: conflict-free counters
**G-Counter (Grow-only Counter)** is the first CRDT that is easy to grasp intuitively. Instead of a single value, each node maintains its own counter. The global value is the sum of all. On merge, the max per node is taken.
G-Counter has three mathematical properties: **commutativity** (merge(A,B) = merge(B,A)), **associativity** (merge(merge(A,B),C) = merge(A,merge(B,C))), **idempotency** (merge(A,A) = A). These three together imply eventual consistency.
G-Counter can only grow. For a full counter with decrement, **PN-Counter (Positive-Negative Counter)** uses two G-Counters: one for increments, one for decrements.
Redis Enterprise: Active-Active replication
Redis Enterprise uses PN-Counter for replication between geographically distributed clusters. A client writes to the nearest node (0 ms additional latency). Nodes exchange state asynchronously. Like counters, view counts, ratings - all without coordination. During a network split both data centers continue operating independently and merge correctly when connectivity is restored.
Node A: counters={A:3, B:2}. Node B: counters={A:1, B:5}. After merge the G-Counter value is:
LWW-Register and OR-Set: register and set
**LWW-Register (Last-Writer-Wins Register)** is a CRDT for a single value. On concurrent writes, the one with the higher timestamp wins. The simplicity is attractive - but this is where mistakes happen most often.
LWW with Date.now(): if two nodes' clocks drift by 100 ms (realistic in AWS multi-AZ), a later write with lagging clock will be lost. Google Spanner solves this with GPS and atomic clocks. In ordinary systems use HLC (Hybrid Logical Clock) from the vector clocks lesson.
**OR-Set (Observed-Remove Set)** solves the classic problem: what happens when add and remove of the same element occur concurrently on different nodes? Each add operation receives a unique tag. Remove only deletes the tags the node had already observed at the time of the remove call.
**OR-Set conflict semantics:** Node A does add('item') with tag T1. Concurrently Node B does remove('item'), knowing only tag T0 (an earlier one). After merge - 'item' is present. Add wins. This is intuitive: someone added the element without knowing it was being removed - their intent is preserved.
LWW always loses data on concurrent writes
LWW works correctly when writes target independent fields - the problem only arises when two writes target the exact same field
In document CRDTs (Automerge, Yjs) each field is a separate LWW-Register. Concurrent edits to 'color' and 'position' fields do not conflict at all. Only concurrent edits to the same field require more complex semantics.
Node A does add('item') with tag T1. Concurrently Node B does remove('item'), knowing only tag T0. After merge, is 'item' in the set?
CvRDT, CmRDT and real-world usage
**CRDT theory divides all structures into two classes.** CvRDT (state-based) transmit the full state during sync - merge guarantees convergence. CmRDT (operation-based) transmit only operations - they need reliable broadcast. All the examples above are CvRDT.
| Type | What is sent | Network requirements | Strengths |
|---|---|---|---|
| CvRDT (State-based) | Full state | Any network, packet loss OK | Simple merge implementation |
| CmRDT (Op-based) | Operations (delta) | Exactly-once delivery | Less bandwidth |
| Delta-CRDT | State delta only | At-least-once + idempotency | Balance between bandwidth and simplicity |
| System | CRDT | Usage |
|---|---|---|
| Redis Enterprise | G-Counter, PN-Counter, LWW-Register | Active-Active Geo-Replication |
| Riak | OR-Set, LWW-Map, G-Counter | Key-value store without coordination |
| Notion (Yjs) | Yjs CRDT in production | Proven on 100M+ users for collaborative page editing |
| Apple Notes / iCloud | Operation-based CRDT | Sync across devices without conflicts |
| Automerge / Yjs | JSON CRDT (OR-Map + RGA for text) | Offline-first apps, Linear mobile sync |
| Figma / Google Docs (OT, NOT CRDT) | Custom OT with central server | OT, not CRDT: requires server-side ordering |
**CRDT vs OT (Operational Transformation):** Google Docs and Figma use OT - an older approach that requires a central server to order operations. CRDT works without a center. Apple Notes, Linear, Notion and the entire Yjs ecosystem ship CRDTs in production. Figma and Google Docs stay on OT with a central server.
Automerge: JSON document as CRDT
Automerge turns any JSON object into a CRDT automatically. Each field is an LWW-Register. Arrays use RGA (Replicated Growable Array). Text uses a specialized CRDT with positional identifiers. Two users edit offline - after synchronization they get a deterministic result with no data loss. Used in Linear, Actual Budget, and several LSP tools.
CRDTs guarantee that all operations will be executed
CRDTs guarantee state convergence, but not the absence of value loss with LWW semantics
Вопросы для размышления
- A system stores a product rating as the sum of user scores. Millions of users, scores arriving from 5 data centers in parallel. Which CRDT fits, and what tradeoffs does it carry?
Связанные уроки
- ds-08-vector-clocks — CRDTs rely on vector clocks for conflict tracking
- ds-09-gossip-protocols — Gossip disseminates CRDT state across replicas
- dist-12-consistency — CRDTs provably achieve eventual consistency
- dist-11-replication — CRDT-based replication avoids coordination overhead
- crypto-04-algebraic-structures