Real-Time Backend

Automerge

Two users edit the same document offline on a plane. They land and the changes need to merge. Without CRDT that's a manual conflict. With Automerge it's automatic in milliseconds.

  • Logseq, a note graph with 300k+ active users, uses Automerge to sync between devices without a central server. A user edits their laptop and phone offline, and on reconnect conflicts resolve automatically
  • Actual Budget stores financial transactions as an Automerge document. The sync server only sees encrypted binary blobs and can't read the data - privacy-first architecture on CRDT
  • Ink & Switch Research Lab (the creators of Automerge) published the local-first software manifesto (2019, ~50k reads). Their PushPin collaborative canvas showed that P2P apps with CRDT work more reliably than centralized ones on unstable connections
  • Figma uses its own CRDT system for real-time collaborative editing with <100ms latency for 50+ concurrent users - commercial proof that the approach scales

What Automerge is

**Automerge** is a CRDT library that turns a regular JavaScript object into a data structure that auto-merges conflicting changes. Ink & Switch Research Lab built it as the foundation for local-first apps: data lives on the device and syncs between peers without a central server.

Core idea: every document change gets a unique identifier (actor ID + sequence counter). When merging two versions, Automerge applies deterministic rules that guarantee the same result regardless of the order changes arrive.

Automerge ships in projects like PushPin (collaborative canvas from Ink & Switch), Logseq (local knowledge base with 300k+ users), and Actual Budget (a finance app with local data). The v2.0 library was rewritten in Rust with WebAssembly bindings - the binary storage format is 10x smaller than JSON.

0

1

Sign In

What guarantees a deterministic merge in Automerge for concurrent changes on two clients?

JSON CRDT inside Automerge

Automerge implements a **JSON CRDT**: an extension of operational CRDTs for arbitrary JSON structures. Each data type has its own merge strategy: Map uses LWW (last-write-wins) by actor ID for scalars, List uses RGA (Replicated Growable Array) to preserve relative insert order.

Data typeCRDT algorithmBehavior on conflict
Map (object)Multi-Value RegisterBoth values are kept, the app picks one
List (array)RGA (Replicated Growable Array)Order determined by the inserter's actor ID
TextRGA with character granularityCharacters insert by logical position
CounterIncrement-only CRDTAll increments sum

RGA (Replicated Growable Array) was proposed by Hyun-Gul Roh et al. in 2011. Each list element gets a unique ID. Inserting a new element references the predecessor's ID. This handles concurrent inserts correctly without coordination.

Two users insert an element after 'a' in the list ['a', 'c'] at the same time. How does Automerge determine the final order?

Automerge sync protocol

Automerge v2 includes a built-in **sync protocol** based on bloom filters and exchanges of missing changes. Two peers exchange the minimum number of messages to sync, without shipping the whole document.

The protocol uses `SyncState`: an object that tracks what each peer already knows. Each round generates a `SyncMessage` with a bloom filter of known changes. On receipt, the other peer computes which changes it needs to send.

The bloom filter in the sync protocol lets a peer compactly encode the set of known changes (100k operations is ~10KB filter). Critical optimization: without it you'd ship the full list of change IDs, which is megabytes for a large document. Bloom filter false positives are safe: they just cause unnecessary retransmission of already-known changes.

  • SyncState is created per peer connection
  • generateSyncMessage returns null when sync is complete
  • receiveSyncMessage is atomic: it either applies changes or doesn't
  • The protocol runs on any transport: WebSocket, WebRTC, HTTP

Why does the Automerge sync protocol use a bloom filter instead of a full list of change IDs?

Storing Automerge documents

Automerge v2 uses a **binary format** based on columnar encoding (similar to Apache Arrow). The document stores not only the current state but the full change history, which is what allows syncing with peers that missed some updates.

For production apps history size management is critical. Automerge supports **compaction**: creating a new document with the current state and no history. After compaction you can no longer sync with peers that have older versions - you'd need to ship a full snapshot.

automerge-repo (the official high-level library) handles storage, sync, and document management. Storage adapters: IndexedDB (browser), SQLite (Node.js), filesystem. Network adapters: WebSocket, WebRTC, BroadcastChannel. Real-world case: Actual Budget stores user financial data as an Automerge document, syncing between devices through its own sync server that can't see the data.

Automerge automatically drops old history and always keeps the document compact

Change history accumulates indefinitely. Without explicit compaction the document grows proportionally to the number of operations

History is needed to sync with peers that might have been offline. Automerge doesn't know which changes every peer has received, so it stores everything. Compaction is an explicit developer decision with a trade-off in sync compatibility.

An Automerge document grew to 50MB because of accumulated change history. What happens after compaction (save + load)?

Key takeaways

  • Automerge is a JSON CRDT library: any JavaScript object can become an auto-merging document with full change history
  • Map uses Multi-Value Register (conflicts available via getConflicts); List uses RGA with deterministic order for concurrent inserts by actor ID
  • The sync protocol uses bloom filters to minimize traffic: instead of full change ID lists you ship compact filters. 2-3 rounds is usually enough
  • Change history accumulates and needs explicit compaction (save+load). After that, lagging peers can only sync via a full snapshot

Related topics

Automerge builds on core distributed systems ideas and lives in local-first architectures.

  • CRDT (Conflict-free Replicated Data Types) — Automerge is a concrete implementation of operational CRDTs for JSON documents
  • Operational Transformation — An alternative to collaborative editing. OT needs a central server, CRDT does not
  • Vector Clocks — Automerge uses logical clocks (actor ID + sequence) to order operations
  • Local-first Software — Automerge was built as the technical foundation of the local-first architecture: data on device, sync when possible

Вопросы для размышления

  • Actual Budget uses Automerge to store financial data offline. What trade-offs arise as change history grows over years of usage?
  • Figma built its own CRDT instead of using Automerge. When does building a custom CRDT make sense over an off-the-shelf library?
  • Automerge keeps conflicting values via getConflicts and leaves resolution to the app. How would a UI for a text editor look when showing conflicts between several authors?

Связанные уроки

  • rt-50 — Yjs is the alternative production CRDT framework with different trade-offs (throughput-first vs history-first)
  • rt-47 — Automerge is a concrete implementation of the CRDT principles introduced earlier
  • ds-10-crdts
Automerge