Real-Time Backend

Conflict resolution

Two people pressed Save at the same time. Whose data is right? The answer determines whether the user loses their work. This happens millions of times a day in Google Docs, Figma, and Notion.

  • **Amazon DynamoDB** uses LWW by default and recommends conditional writes for critical data. A misconfigured LWW in production has caused silent data loss for major customers
  • **CockroachDB** uses HLC instead of wall clock for LWW: up to 250ms of clock skew doesn't break conflict resolution correctness
  • **Notion** on block-level delete vs. text-edit conflict always picks the block delete and notifies the user. A deliberate, predictable behavior
  • **Figma** on conflicting object properties uses field-level LWW: object color and position resolve independently, you can lose only one property, not the whole object

Conflict strategies

A conflict in a distributed system isn't an error, it's a normal event. Two users changed the same thing at the same time - who's right? The answer depends on the resolution strategy. There are several in real systems, and the choice defines the user experience.

Three core strategies: **Last Write Wins (LWW)** - the latest write by timestamp wins. **First Write Wins (FWW)** - whoever committed first wins. **Merge** - both changes combine into a result that contains both.

**The LWW problem:** wall clock timestamps are unreliable in distributed systems - clocks on different servers drift. Amazon DynamoDB uses LWW by default and explicitly warns: if exact ordering matters, use conditional writes or versioning. Cassandra is also LWW, with configurable conflict resolution via a `TIMESTAMP` column.

Two clients set a field at the same time: A=42 (timestamp=1000ms) and B=99 (timestamp=999ms). LWW with wall clock picks A. What's the risk?

Merge policies

A merge policy is the rule for how conflicting versions combine. In text, merge is natural: inserts from two users both end up in the document (CRDT does this automatically). For structured data, merge is harder.

The key principle of a good merge policy is **semantic correctness**: the result must be a valid system state, not just something that technically doesn't conflict. If Alice renamed a file to 'report.pdf' and Bob moved it to a different folder, a correct merge keeps both changes.

**MongoDB** uses field-level merge during replication: if two documents change different fields, there's no conflict and both changes apply. **CouchDB** does document-level versioning with explicit conflicts: the app resolves the conflict and writes back the winner. **Automerge** (CRDT) does field-level merge automatically for JSON documents.

Alice changed `price: 100`, Bob changed `description: 'new text'` in the same JSON object. How should a correct merge policy behave?

User intent

Technical merge correctness is necessary but not sufficient. Alice deleted a paragraph because it was outdated; Bob fixed a typo in that paragraph. The CRDT merge keeps the paragraph with the typo fix - technically correct, but **Alice's intent is lost**. This is called an intent conflict.

Intent-aware resolution tries to understand the user's intent, not just the technical diff. It's a hard problem: intent isn't transmitted explicitly in the protocol. Solutions: semantic types on operations, user annotations, ML intent classification.

**Notion** uses block-level granularity: if Alice deletes a block entirely while Bob edits text inside it, the block delete wins (structural-delete). Notion surfaces 'Your changes were lost because the block was deleted'. That's an intentional UX decision, not a bug.

Alice deleted a document section (intent: structural-delete). Bob concurrently formatted that section's heading (intent: format). How should intent-aware resolution behave?

Resolution implementation

A practical conflict resolution implementation is a pipeline of several layers. First, conflict detection (concurrent operations on overlapping data). Then classification (LWW / merge / intent-based). Then resolution and optional user notification.

**Hybrid Logical Clock (HLC)** replaces wall clock for LWW: HLC = max(localTime, receivedTime) + counter. Monotonically increases and captures causality. Used in CockroachDB and YugabyteDB. With HLC, LWW conflicts resolve deterministically and correctly even under hundreds of milliseconds of clock skew.

Conflicts in collaborative systems are a sign of bad design, they should be eliminated through pessimistic locking

Conflicts are unavoidable with concurrent work; the goal is to design the right resolution strategy, not to forbid parallel editing

Pessimistic locking kills collaborative UX: you can't edit while someone else holds a lock. Google Docs, Figma, and Notion work optimistically: they accept conflicts and resolve them. Users get a smooth experience and conflicts are resolved transparently.

Two insert operations at the same document position - is this a conflict that requires LWW or intent resolution?

Key takeaways

  • LWW/FWW/Merge are the three base strategies; the choice depends on data semantics and acceptable loss
  • Field-level merge (MongoDB, Automerge) beats document-level: independent field changes don't conflict
  • Intent-aware resolution adds semantics: structural-delete outranks format; notifying the user about lost changes is mandatory

Related topics

Conflict resolution is a central theme in distributed systems with several overlapping areas:

  • CRDT structures — CRDT auto-resolves insert/delete conflicts via a mathematically proven merge; conflict resolution is needed for LWW semantics layered on top of CRDT
  • Undo/Redo — Undo creates a special class of conflicts: the inverse op races against later remote changes
  • Eventual Consistency — Conflict resolution is the mechanism for achieving eventual consistency: all replicas eventually reach the same state

Вопросы для размышления

  • In a spreadsheet Alice sets a formula `=A1+B1` in cell C1, Bob writes `42` to the same cell at the same time. Which resolution strategy is correct, and why?
  • How do you notify users about lost changes without burying them in alerts during active collaboration?
  • Can you trust wall clock timestamps for LWW in a system where clients are mobile devices with possible clock drift?

Связанные уроки

  • rt-45 — Builds on the concurrent edits problem and LWW basics introduced earlier
  • rt-53 — Undo creates the same family of concurrent-write conflicts that need explicit resolution
  • dist-12-consistency
Conflict resolution

0

1

Sign In