Real-Time Backend

CRDTs for text

Google Docs is open in two tabs, both editing the same paragraph. How do changes not get lost or collide without a single arbiter server?

  • **Figma** moved from OT to a custom CRDT in 2019, removing the merge server from the critical path and cutting latency from 80ms to 12ms for European users on a US server.
  • **Linear** runs real-time collaboration on Yjs in issues and docs. In their 2022 benchmark Yjs handles 150,000 operations per second on one document, covering any realistic scenario.
  • **Notion** built their own block-level CRDT in 2021 after neither Yjs nor Automerge supported nested blocks with rich text. The migration covered 30M+ documents without data loss.
  • **Gitpod** integrated Yjs into the Monaco Editor for pair programming. Two developers edit one file with cursor awareness, and merge works peer-to-peer even when the server is briefly unreachable.

Text CRDT

Collaborative text editing is the problem of two users editing the same document at once, where both changes must merge without loss. Operational Transform (OT), which Google Docs has used since 2006, solves this through an arbiter server: each operation is transformed against operations already applied. The downside is that the server becomes the single point of failure and a bottleneck above 100 simultaneous editors.

Text CRDT removes the server from the critical path. Instead of operations ('insert x at position 5'), every character gets a globally unique identifier and stores references to its neighbors. Merging two states is deterministic: the same character order is guaranteed by the algorithm, not by a server. Figma, Linear, and Notion all moved from OT to CRDT in 2019-2022.

The key property of a text CRDT is **convergence**: any two replicas that received the same operations in any order end up in the same state. Proven mathematically, not by tests.

Why doesn't a Text CRDT need a central server to merge operations?

RGA (Replicated Growable Array)

RGA (Replicated Growable Array) is the first practical text CRDT, proposed by Roh in 2011. The idea: the document is a linked list of nodes, each holding a character, a unique timestamp (clock, site_id), and a reference to its left neighbor. Inserting 'x' after node P means creating a node (x, my_clock, P). On conflict (two nodes reference the same predecessor), the node with the higher timestamp wins. This is deterministic on every replica.

**Tombstone problem**: deleted nodes can't be dropped immediately. Another client might still insert a character after a deleted one. Yjs handles this with garbage collection once all replicas have acknowledged the deletion. Long-lived Notion documents accumulate around 30% tombstone overhead.

RGA is used in Atom Teletype, ShareDB (alternative backend), and as the foundation of Y.Array in Yjs. Memory: O(n) for n characters plus O(t) for tombstones. Merge complexity: O(n*m) worst case, where n and m are the sizes of the two operation sets.

Why are deleted characters in RGA kept as tombstones instead of being removed right away?

LSEQ

RGA uses timestamps to identify positions, which works but the identifiers grow linearly. LSEQ (Logarithmic SEQuence), proposed in 2013, is an alternative: a tree of positions with fractional addresses. A character between positions 0 and 1 gets address 0.5, between 0 and 0.5 gets 0.25, and so on. Key insight: if you allocate addresses randomly within the free interval, the tree stays balanced with high probability.

LSEQ's problem: with sequential prepend (always inserting at the start), the tree degenerates. Every new address lands in [0, min_existing] and depth grows linearly. PaperTrail ran a benchmark in 2021: for 10k characters, LSEQ IDs averaged 40 bytes versus 16 bytes for RGA timestamps. In practice Yjs picked a hybrid: RGA with a runs optimization (sequential inserts collapse into a single node).

**Runs optimization in Yjs**: if a user types 'hello' sequentially, Yjs stores it as a single Item with length=5 instead of 5 nodes. That's a 5x memory reduction and 3x encode speedup for typical typing patterns.

Which editing pattern makes LSEQ degenerate to O(n) tree depth?

Yjs and Automerge

Yjs (Kevin Jahns, 2015) and Automerge (Martin Kleppmann, 2017) are two production-ready CRDT frameworks with different trade-offs. Yjs: speed-optimized, written in JS, 11k stars on GitHub. Automerge: correctness-optimized with full change history, Rust core with WASM bindings.

  • **Yjs**: Y.Text uses RGA with runs optimization. encode/decode via Uint8Array delta updates. Integrations: Prosemirror, CodeMirror, Monaco, TipTap. Linear runs real-time collaboration on Yjs - 150k ops/sec on one document in their 2022 benchmark.
  • **Automerge**: each document is an immutable snapshot plus a list of changes. Full git-like history support. Ink & Switch uses Automerge in Pushpin for local collaboration with no server at all.
  • **y-websocket** vs **Automerge Repo**: Yjs assumes a central awareness server (not for merge, but for presence). Automerge Repo runs its sync protocol peer-to-peer, closer to the CRDT ideal.
  • **Encode size**: a Yjs v2 update for 10k characters is ~15KB. Automerge is ~40KB (it stores full change history). For mobile clients Figma picked a custom CRDT close to Yjs in size.

Picking between Yjs and Automerge is mostly a choice between throughput and feature completeness. Notion switched to a custom CRDT in 2021 because neither Yjs nor Automerge supported their block-level structure with nested rich text. For 90% of tasks Yjs + y-websocket is enough: 30 minutes to set up, production-proven at Linear, Gitpod, and Liveblocks.

CRDT guarantees that users will see meaningful text after merge. If two people edit the same sentence at once, the result will be readable

CRDT guarantees only convergence (all replicas reach the same state), not semantic correctness. If user A writes 'cat' and user B writes 'dog' at the same position, the result might be 'cdaotg' - that's a mathematically correct merge.

Semantic merge (understanding user intent) is an unsolved problem. Google Docs handles it partially through OT plus UX (showing the neighbor's cursor), not algorithmically. Intent-based CRDT is an active research area in 2023-2025.

What is the main difference between Automerge and Yjs in their approach to data?

Key takeaways

  • Text CRDT replaces positional indices with stable per-character IDs. Merge is deterministic without an arbiter server, local apply latency is 0ms.
  • RGA uses a linked list with timestamp IDs and tombstones for deleted characters; LSEQ uses a tree of fractional positions. Both converge, with different trade-offs in ID size and speed.
  • Yjs (RGA + runs optimization) wins on throughput and size for realistic typing; Automerge (immutable history) wins for time-travel and offline-first with no server at all.

Related topics

Text CRDT builds on the general CRDT principles and integrates with several real-time patterns:

  • CRDT basics — Text CRDT specializes general CRDT for character sequences, taking insert order into account
  • Vector Clocks — RGA uses a hybrid logical clock to generate unique timestamp IDs. Without it there'd be no deterministic conflict tie-breaker
  • WebSocket and real-time transport — Yjs y-websocket and Automerge Repo sync protocol ship delta updates over WebSocket. The transport determines operation delivery latency
  • Eventual Consistency — Text CRDT is the practical implementation of eventual consistency for documents: convergence is guaranteed algorithmically, not through coordination

Вопросы для размышления

  • Tombstones accumulate memory. How would you design garbage collection for an RGA document edited by 1000 users on flaky connections?
  • Yjs handles 150k ops/sec, but a semantic merge can still produce unreadable text. What UX pattern would reduce the chance of conflicting merges without giving up real-time collaboration?
  • Automerge stores full git-like history. How could that be used to automatically resolve conflicts based on user intent analysis?

Связанные уроки

  • rt-47 — Text CRDTs specialize the general CRDT principles for character sequences
  • rt-50 — Yjs is the production-ready implementation built on the RGA algorithm covered here
  • ds-10-crdts
CRDTs for text

0

1

Sign In