Skip to content

CRDTs — Conflict-Free Replicated Data Types

TL;DR

CRDTs are data structures that can be replicated across multiple nodes, updated independently and concurrently without coordination, and merged automatically with a guarantee of eventual convergence. No locks. No consensus. No coordination. The merge function is mathematically guaranteed to produce the same result regardless of the order in which updates arrive. G-Counter, PN-Counter, LWW-Register, OR-Set -- these are the building blocks. Figma uses CRDTs for real-time multiplayer editing. Redis CRDB uses them for active-active geo-replication. The trade-off: larger metadata and weaker consistency guarantees in exchange for zero coordination overhead.


The Problem

You have an application running in three datacenters: US-East, EU-West, and AP-Southeast. Users in each region edit the same document (or counter, or shopping cart) simultaneously. You cannot route all writes through a single leader because the latency from Singapore to Virginia is 200ms round-trip, making the user experience unacceptable.

You want multi-leader (active-active) replication: each datacenter accepts writes locally and replicates asynchronously to others. But now you have concurrent, conflicting updates. User A in Virginia adds item X to the cart. User B in Singapore adds item Y. User C in Frankfurt removes item X. These updates happen simultaneously, arrive at each datacenter in different orders, and must somehow converge to the same state.

Traditional approaches: consensus (too slow across datacenters), last-writer-wins (loses data), manual conflict resolution (pushes complexity to application developers). CRDTs provide a mathematical guarantee: no matter what order the updates arrive, all replicas converge to the same state. The data structure's merge function handles conflicts automatically.


The Algorithm

Two Flavors

CmRDT (Commutative Replicated Data Type): Operation-based. Replicas broadcast operations. Operations must be commutative (order does not matter). Requires reliable broadcast (every operation must reach every replica exactly once).

CvRDT (Convergent Replicated Data Type): State-based. Replicas periodically send their full state to other replicas. The merge function must be commutative, associative, and idempotent (a mathematical join-semilattice). More robust: handles message loss, duplication, and reordering.

In practice, state-based CRDTs are more common because they tolerate unreliable networks.

G-Counter (Grow-Only Counter)

The simplest CRDT. Each node maintains its own local counter. The global count is the sum of all local counters.

State: vector of counters, one per node.
  {A: 0, B: 0, C: 0}

Increment at node A:
  {A: 1, B: 0, C: 0}

Increment at node A again:
  {A: 2, B: 0, C: 0}

Increment at node B:
  {A: 0, B: 1, C: 0}  (B has its own view)

Merge(stateA, stateB):
  For each node i: result[i] = max(stateA[i], stateB[i])
  {A: 2, B: 1, C: 0}

Value = sum of all counters = 3

Why max? If node A reports A:2 and node B reports A:1 (because B has not yet received A's second increment), taking the max gives the correct answer (A:2). The max operation is commutative, associative, and idempotent -- the merge function satisfies all CRDT requirements.

Limitation: Can only count up. You cannot decrement a G-Counter.

PN-Counter (Positive-Negative Counter)

Two G-Counters: one for increments (P), one for decrements (N). The value is P - N.

State: {P: {A: 0, B: 0}, N: {A: 0, B: 0}}

Increment at A: P = {A: 1, B: 0}. Value = 1 - 0 = 1.
Decrement at B: N = {A: 0, B: 1}. Value = 1 - 1 = 0.
Increment at A: P = {A: 2, B: 0}. Value = 2 - 1 = 1.

Merge: take max per node in both P and N vectors.

This handles the "like/unlike" pattern: each like increments P, each unlike increments N. The net count is always correct.

LWW-Register (Last-Writer-Wins Register)

A single value with a timestamp. The value with the highest timestamp wins.

State: (value, timestamp)

Node A: SET("Alice", t=100)  → ("Alice", 100)
Node B: SET("Bob", t=105)    → ("Bob", 105)

Merge: compare timestamps. ("Bob", 105) wins.

But what if clocks are skewed?
Node A: SET("Alice", t=110)  (A's clock is ahead)
Node B: SET("Bob", t=105)    (B's clock is behind, but wrote later)

Merge: ("Alice", 110) wins. Bob's write is silently lost.

The problem: LWW depends on timestamps, and timestamps in a distributed system are unreliable (as covered in the clocks lesson). LWW is simple but can lose data when clocks skew. Cassandra uses LWW for its columns, and this is a deliberate trade-off: simplicity over perfect conflict resolution.

Mitigation: Use Hybrid Logical Clocks (HLC) instead of wall-clock timestamps. HLCs are causally consistent -- if write B causally follows write A, HLC(B) > HLC(A) is guaranteed. This prevents the most egregious clock-skew data loss.

OR-Set (Observed-Remove Set)

The most practically useful CRDT. Add and remove operations on a set, where add wins over concurrent remove.

The key idea: each add operation generates a unique tag. A remove operation only removes the specific tags it has observed, not future adds.

State: set of (element, unique_tag) pairs.

Node A: add("milk")  → {("milk", tagA1)}
Node B: add("eggs")  → {("eggs", tagB1)}

Merge: union of all (element, tag) pairs.
  {("milk", tagA1), ("eggs", tagB1)}

Node A: remove("milk")
  A has observed {("milk", tagA1)}. Remove tagA1.
  A's state: {("eggs", tagB1)}

Concurrent: Node C adds "milk" (has not seen the remove):
  C's state: {("milk", tagC1), ...}

Merge A and C:
  {("eggs", tagB1), ("milk", tagC1)}
  "milk" is in the set because tagC1 was never removed (A only removed tagA1).
  Add wins over concurrent remove. This is the desired behavior.

Why this is the right semantics: If user A removes "milk" from a shared grocery list while user C simultaneously adds "milk," the intent is ambiguous. Having "milk" in the final list (add wins) is a safer default than silently losing C's addition. Applications can always add explicit UI for "this item was removed by someone else."

Worked Example: Shopping Cart CRDT

Three replicas: US, EU, AP
Initial state: empty cart

1. User in US adds "Book":
   US: {("Book", us1)}

2. User in EU adds "Pen" (concurrent):
   EU: {("Pen", eu1)}

3. US and EU merge:
   US = EU = {("Book", us1), ("Pen", eu1)}

4. User in AP adds "Book" again (before seeing US's add):
   AP: {("Book", ap1)}

5. User in US removes "Book" (sees tag us1):
   US: {("Pen", eu1)}  — removed us1

6. All replicas merge:
   US tags removed: {us1}
   AP has: {("Book", ap1)}   — ap1 was NOT in the removed set
   Final: {("Pen", eu1), ("Book", ap1)}

   "Book" survives because AP's independent add created a new tag (ap1)
   that the US remove did not observe. This is correct: the AP user
   intended to have "Book" in the cart.

Proof/Correctness Intuition

Why CvRDTs Converge

A CvRDT's merge function forms a join-semilattice: a partially ordered set where every two elements have a least upper bound (join). The three properties ensure convergence:

  1. Commutative: merge(A, B) = merge(B, A). Order of merging does not matter.
  2. Associative: merge(merge(A, B), C) = merge(A, merge(B, C)). Grouping does not matter.
  3. Idempotent: merge(A, A) = A. Duplicate messages are harmless.

For the G-Counter, the merge function is element-wise max. Max is commutative (max(a,b) = max(b,a)), associative, and idempotent (max(a,a) = a). Therefore, regardless of how many times replicas exchange state, in what order, or with what duplication, all replicas will converge to the same state.

Why the Metadata Grows

OR-Set must track every unique tag ever created for an element. If "milk" is added and removed 1000 times, there are 1000 tags in the history. Garbage collection is possible (tombstone pruning after all replicas have observed the remove) but requires coordination -- which partially defeats the purpose.

This metadata overhead is the fundamental cost of coordination-free conflict resolution. You are trading space for the absence of locks.


Real-World Usage

System CRDT Used What For
Figma Custom CRDTs Multiplayer design editing
Redis CRDB Counters, sets, registers Active-active geo-replication
Riak Maps, sets, counters, flags Multi-datacenter replication
Apple Notes CRDTs (undisclosed details) Cross-device sync
SoundCloud Counters Play count aggregation
Phoenix LiveView CRDTs for presence Real-time user presence tracking

Figma's approach is worth studying. They use a custom CRDT for their design canvas. Each object on the canvas (rectangle, text, frame) has properties that are LWW-Registers (last write wins for position, color, size). The canvas itself is a tree-structured CRDT (parent-child relationships). When two users move the same rectangle simultaneously, the last write wins for position. When one user moves an object and another changes its color, both changes are preserved (different registers).

This is a common pattern: use different CRDTs for different parts of the data model. Not everything needs an OR-Set. Many properties are fine with LWW.


Interview Application

When to mention CRDTs:

  • "How would you build a collaborative editing feature?" -- CRDTs for conflict-free concurrent edits. Figma, Google Docs (uses OT, but CRDTs are an alternative).
  • "How does active-active replication handle conflicts?" -- CRDTs for automatic convergence. Redis CRDB, Riak.
  • "Design a distributed counter (like counts, view counts)." -- G-Counter or PN-Counter CRDT. No coordination needed.
  • "How do you handle shopping cart conflicts in a multi-region setup?" -- OR-Set CRDT. Add wins over concurrent remove.

What interviewers want to hear:

  1. You understand the difference between CRDTs and consensus (CRDTs sacrifice strong consistency for availability and partition tolerance).
  2. You can describe at least one CRDT (G-Counter is the simplest to explain).
  3. You know the metadata trade-off (OR-Set tags grow).
  4. You know when NOT to use CRDTs (bank balances, inventory counts that must never go negative).

Trade-offs

CRDT G Counter

Advantage Disadvantage
Zero coordination (no locks, no consensus) Metadata overhead (tags, vectors)
Always available for writes Eventual consistency only
Automatic conflict resolution Semantics may surprise users (add wins)
Works offline (merge when reconnected) Cannot express invariants (e.g., non-negative balance)
Mathematically proven convergence Complex to implement correctly
Linear scalability Garbage collection requires coordination

The invariant problem: CRDTs cannot enforce global invariants. A PN-Counter can go negative: if node A increments and node B decrements before seeing A's increment, the local value at B is -1. After merge, the value is 0 (correct), but B temporarily showed -1. For an inventory system where selling more than available stock is catastrophic, CRDTs alone are insufficient. You need coordination (consensus) for invariants.

CRDTs vs OT (Operational Transformation)

Google Docs uses OT. Figma uses CRDTs. Both solve collaborative editing, but differently:

  • OT: Transform operations against concurrent operations. Requires a central server to determine operation order. Proven correct for text editing but complex for richer data models.
  • CRDTs: No central server needed. Operations commute by construction. Easier to reason about, but metadata overhead is higher.

For text editing, modern CRDTs (Yjs, Automerge) are competitive with OT in both performance and correctness.


Common Mistakes

CRDTs provide strong consistency

No. CRDTs provide eventual consistency. At any given moment, different replicas may have different values. They are guaranteed to converge to the same value once all updates have been delivered to all replicas. This is weaker than linearizability or sequential consistency.

CRDTs can replace consensus

CRDTs complement consensus, they do not replace it. Use CRDTs for data that can tolerate temporary divergence (counters, sets, text). Use consensus for data that requires immediate agreement (leader election, configuration changes, financial transactions).

LWW-Register is safe for all use cases

LWW silently drops concurrent writes. If two users simultaneously set a user's email address, one write is lost without notification. This is acceptable for "last editor wins" scenarios (like a text field in a form) but dangerous for fields where both writes matter.

OR-Set has no overhead

OR-Set tracks a unique tag for every add operation. In a high-churn scenario (add/remove the same element repeatedly), the tag set grows linearly. Garbage collection (pruning tags that all replicas have observed as removed) requires a protocol that is itself a form of coordination.

You need CRDTs for multi-region replication

Many multi-region systems use single-leader replication (all writes go to one region, reads can be served from any region). This avoids conflicts entirely. CRDTs are needed only when you want multi-leader writes (active-active), which is a choice driven by write latency requirements.

All CRDTs are equally practical

Some CRDTs are research curiosities. The practical set is small: G-Counter, PN-Counter, LWW-Register, OR-Set, and composites (maps of registers, sequences for text). More exotic CRDTs (grow-only DAGs, replicated trees) are harder to implement correctly and have higher metadata costs.