Real-Time Backend

Design: Figma Multiplayer

In 2022 Adobe paid USD 20 billion for Figma. Not for the pixels, not for the UI kit. For the multiplayer architecture that nobody managed to copy in six years. How did they pull it off?

  • Figma handles millions of concurrent sessions. A single Airbnb Design System document holds 15,000+ components and is edited by 50 designers at the same time without conflicts.
  • Google Docs picked a thin server + smart client for text. Figma picked a fat WASM client + stateful server for canvas. Different problems call for different architectures.
  • Linear, Miro, FigJam, Canva all build multiplayer canvas on similar principles: OT or CRDT, WebSocket, spatial indexing. Understanding Figma is understanding a whole class of systems.
  • Notion was bought for USD 2B, Miro raised USD 400M. Multiplayer collaborative tools are one of the most expensive segments of SaaS. The architecture directly drives the price of the product.

Figma Multiplayer architecture

In 2016 Figma raised USD 14M and bet on the browser. Competitors laughed: 'you cannot build Photoshop in a browser'. Six years later Adobe bought Figma for USD 20B. The key is not WebGL, and not React. The key is the multiplayer architecture that competitors failed to copy.

Figma is built on three layers. The **client** (browser) holds the whole document in memory through a WebAssembly engine written in C++. The **sync server** is a set of stateful processes, each one owning a single document end to end. The **persistence layer** is PostgreSQL plus S3 for snapshots. This is not microservices. It is intentional monolithic ownership.

Every Figma document lives in the memory of exactly one server. Routing goes through consistent hashing by document_id. If that server fails, the document moves to another node and clients reconnect. Stateful single-owner means a simple concurrency model.

Compare with Google Docs: there the server is a thin relay and the client does not hold the whole document. Figma picked fat client + smart server, because canvas documents weigh 50 to 500 MB and rendering needs the GPU. Network latency on every pixel would be death.

  • Single-owner per document: no distributed locks, no split-brain
  • WASM engine on the client: offline render, no RTT on every action
  • Persistent WebSocket: push from the server with no polling
  • Ops-based sync: operations travel over the wire, not snapshots

Why does Figma keep each document in the memory of a single server instead of distributing it across the cluster?

Canvas rendering in the browser

Figma renders the canvas through WebGL, and the document engine is written in C++ and compiled to WebAssembly. This is not a performance hack, it is the foundation. C++ lets the app hold 10,000+ layers without GC pauses, which would kill animation.

The document is stored as a **tree of nodes** (SceneGraph). Each node is a frame, component, vector or text. When a node changes, only the subtree from that node up to the root is recomputed. This is classic **dirty-flag propagation**: mark the node dirty, on the next frame traversal redraw only the dirty nodes.

Figma uses two canvases: one for the main content (WebGL), the other for the UI overlay (2D canvas API): cursors of other users, selection handles, comments. The split lets cursor positions update at 60fps without re-rendering the scene.

Text is rendered through a custom engine, not through the DOM. The reason: CSS text rendering does not give pixel-perfect agreement across operating systems. Figma needs the design to look identical for every participant in a session. That requires full control over glyph shaping.

  • DOM / SVG rendering — Simple API, but slow with 1000+ elements. The browser does not know about the app's layout and optimizes blindly.
  • WebGL + WASM (Figma) — Full control over the GPU pipeline. Batched draw calls. No GC pauses. You have to write your own text/bezier renderer.

Why does Figma keep two separate canvas layers (WebGL + 2D canvas API)?

Sync protocol: OT in real time

The central multiplayer problem: two users edit the same object at the same time. User A drags a rectangle right by 50px. User B drags it down by 30px at the same moment. Which result is correct? Both at the same time. These are **concurrent operations**.

Figma uses **Operational Transformation (OT)**. Every user action is an operation with a type, a target node, and parameters. Operations are commuted through the server. The server is the only arbiter of order. It assigns each operation a global sequence number and broadcasts it to all clients.

Figma applies optimistic updates: the client applies its operation locally right away without waiting for the server's reply. If the server rejects it (conflict), the client rolls back and applies the server's version. In practice conflicts are rare: users usually work in different parts of the canvas.

For undo Figma does not store a history of states (that would be gigabytes). Each operation carries an **inverse operation**, the operation that cancels its effect. Undo means applying inverses in reverse order. This is the **command pattern** with reversibility.

Concurrent operations: how OT resolves them

User A (seqNo 1): MOVE node #42, dx=+50, dy=0 User B (seqNo 2): MOVE node #42, dx=0, dy=+30 Server sees op1 first, broadcasts seqNo=1 Client B has already applied its op locally (dx=0, dy=+30) On receiving seqNo=1, B transforms: result is dx=+50, dy=+30 Both clients reach the same state.

Why does Figma store an inverse operation inside every operation instead of full state snapshots?

Viewport management: see only what you need

A Figma document can hold 100 pages and 50,000 objects. Loading everything on open would take minutes. Figma loads only the **visible viewport** plus a small buffer around it. The rest is fetched as the user scrolls and zooms.

The implementation uses a **quadtree spatial index**: the whole canvas is split into quadrants. On render, traverse the quadtree and include only nodes intersecting the current viewport. On zoom out the nodes are small, so the engine kicks in LOD (level of detail): show simplified placeholders instead of detailed vectors.

Multiplayer viewport: the server knows the viewport of every participant. If User B is looking at page 3 while User A is editing page 1, B does not get ops from page 1. The broadcast is filtered by viewport subscription. This cuts traffic by 10 to 100 times on a large document.

Cursors of other users are a separate data stream. Cursor coordinates are sent over WebSocket with a 60ms throttle (not per pixel). On the client the position is interpolated between received points, smooth motion without flooding the channel.

  • Quadtree culling: O(log N) lookup of visible nodes instead of O(N)
  • LOD: simplified shapes at low zoom, details at high zoom
  • Viewport subscription: the server sends only ops for the visible area
  • Cursor interpolation: smoothness with a 60ms throttle

Figma uses peer-to-peer sync between clients for speed

All sync goes through an arbiter server, there is no P2P

P2P removes the central arbiter and introduces the CAP problem. Without a server you cannot guarantee a consistent total order of operations. Figma trades latency (RTT to the server) for correctness. Latency is masked by optimistic updates on the client.

Why does the server filter broadcast operations by each client's viewport?

Takeaways

  • Single-owner per document: every document lives in one server's RAM, no distributed locks, simple consistency
  • WASM + WebGL client: a C++ engine in the browser delivers 60fps rendering without GC pauses, full offline capability
  • OT with an arbiter server: operations are transformed through the server, optimistic updates on the client mask latency
  • Quadtree viewport culling: load and sync only the visible area, cuts traffic and memory by 10 to 100 times

Related topics

Figma Multiplayer brings together several core patterns of distributed systems

  • Operational Transformation / CRDT — The base algorithm for resolving concurrent operations in collaborative editors
  • WebSocket and Server-Sent Events — Transport layer for real-time push from the server to clients
  • Consistent Hashing — Routing a document_id to a specific server in the cluster
  • Spatial Indexing (Quadtree, R-tree) — Efficient search for objects in 2D space for viewport culling

Вопросы для размышления

  • Figma chose a stateful single-owner server. What trade-offs does this create on server failure: what happens to the document and to the users in a session?
  • OT requires a central arbiter server, a CRDT allows P2P. Why did Figma pick OT even though a CRDT would not need a server?
  • Viewport subscription cuts traffic but creates an edge case: User A makes a change, User B does not receive it (not in viewport), then B zooms over to that area. How should the system handle this?

Связанные уроки

  • sd-01-intro
Design: Figma Multiplayer

0

1

Sign In