CRDT Complete Guide 2025: Conflict-Free Replicated Data Types, Local-First Collaboration, Yjs/Automerge

TL;DR

The magic of CRDTs: multiple users can modify different replicas concurrently and automatically converge to the same state. No locks, no central server required
Two approaches: State-based (send full state) vs Operation-based (send operations). Each with pros and cons
Local-first movement: started by Ink & Switch research lab. Adopted by Figma, Linear, Notion
Yjs vs Automerge: Yjs excels at performance and text collaboration, Automerge excels at JSON data models
When to use: collaborative editors, offline-first apps, P2P sync, distributed memos

1. What is a CRDT?

1.1 The Problem — Conflicts in Distributed Environments

Scenario: Two users edit the same document simultaneously.

Initial: "Hello"

User A adds:           "Hello World"
User B adds concurrently: "Hello Friend"

How to merge? Which one wins?

Traditional solutions:

Last Write Wins — one person's change disappears
Manual merge — user resolves directly
Locking — only one can edit, collaboration impossible

CRDT's answer: Merge both changes automatically and meaningfully. Both are preserved.

1.2 Mathematical Definition of CRDTs

A CRDT is a data type with three properties:

Associativity: (a + b) + c = a + (b + c)
Commutativity: a + b = b + a
Idempotence: a + a = a

If all three hold, merging in any order yields the same result. This is called the ACI property or semilattice.

1.3 Two Approaches

State-based CRDT (CvRDT)

Send the full state
Merge function: merge(a, b) -> c
Pros: simple, message order doesn't matter
Cons: large state = high network cost

Operation-based CRDT (CmRDT)

Send only operations
All nodes must apply the same operations the same number of times
Pros: small messages
Cons: requires reliable message delivery (at-least-once)

Practical choice: most libraries (Yjs, Automerge) use a hybrid approach.

2. Basic CRDT Types

2.1 G-Counter (Grow-only Counter)

A counter that can only increment.

class GCounter:
    def __init__(self, node_id):
        self.node_id = node_id
        self.counts = {}  # {node_id: count}

    def increment(self):
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1

    def value(self):
        return sum(self.counts.values())

    def merge(self, other):
        # Merge each node's count with max
        for node, count in other.counts.items():
            self.counts[node] = max(self.counts.get(node, 0), count)

Why max?: Each node's count is monotonically increasing. Taking max automatically determines which node is more recent.

Use cases: page view counts, likes.

2.2 PN-Counter (Positive-Negative Counter)

Supports both increment and decrement.

class PNCounter:
    def __init__(self, node_id):
        self.positive = GCounter(node_id)  # for increments
        self.negative = GCounter(node_id)  # for decrements

    def increment(self):
        self.positive.increment()

    def decrement(self):
        self.negative.increment()

    def value(self):
        return self.positive.value() - self.negative.value()

    def merge(self, other):
        self.positive.merge(other.positive)
        self.negative.merge(other.negative)

Key trick: split into two G-Counters. Decrement is expressed as "incrementing the negative side."

2.3 G-Set (Grow-only Set)

A set that supports only additions.

class GSet:
    def __init__(self):
        self.elements = set()

    def add(self, e):
        self.elements.add(e)

    def merge(self, other):
        self.elements |= other.elements  # union

Limitation: no removals. The set grows forever.

2.4 2P-Set (Two-Phase Set)

Supports both add and remove. However, once removed, an element cannot be re-added.

class TwoPhaseSet:
    def __init__(self):
        self.added = set()
        self.removed = set()

    def add(self, e):
        if e not in self.removed:
            self.added.add(e)

    def remove(self, e):
        self.removed.add(e)

    def contains(self, e):
        return e in self.added and e not in self.removed

    def merge(self, other):
        self.added |= other.added
        self.removed |= other.removed

Limitation: tombstones (removal markers) accumulate forever.

2.5 LWW-Register (Last-Write-Wins Register)

A single value based on timestamps.

class LWWRegister:
    def __init__(self):
        self.value = None
        self.timestamp = 0

    def write(self, value, timestamp):
        if timestamp > self.timestamp:
            self.value = value
            self.timestamp = timestamp

    def merge(self, other):
        if other.timestamp > self.timestamp:
            self.value = other.value
            self.timestamp = other.timestamp

Problem: clock synchronization. Solved with Hybrid Logical Clock (HLC).

2.6 OR-Set (Observed-Remove Set)

The most practical Set CRDT. Free add/remove.

class ORSet:
    def __init__(self):
        # {element: set of unique tags}
        self.elements = defaultdict(set)
        self.tombstones = defaultdict(set)

    def add(self, e):
        tag = uuid.uuid4()
        self.elements[e].add(tag)

    def remove(self, e):
        # Move all currently visible tags to tombstones
        self.tombstones[e] |= self.elements[e]

    def contains(self, e):
        return bool(self.elements[e] - self.tombstones[e])

    def merge(self, other):
        for e, tags in other.elements.items():
            self.elements[e] |= tags
        for e, tags in other.tombstones.items():
            self.tombstones[e] |= tags

Core idea: each add gets a unique tag. Remove adds "currently observed tags" to tombstones. New adds get new tags, so they survive.

3. Text Collaboration — The Hardest CRDT

3.1 Why Is Text So Hard?

Scenario: two users insert at the same position simultaneously.

Initial: "ABCDE"

User A: insert "X" at position 2 -> "ABXCDE"
User B: insert "Y" at position 2 -> "ABYCDE"

Merge: "ABXYCDE" or "ABYXCDE"?

Problem: positions are relative. One user's insert invalidates another user's positions.

3.2 RGA (Replicated Growable Array)

Each character gets a unique ID. Reference IDs instead of positions.

Initial document:
  [start] - A(id1) - B(id2) - C(id3) - [end]

User A: insert X after B (id4)
  [start] - A(id1) - B(id2) - X(id4) - C(id3) - [end]

User B: insert Y after B (id5)
  [start] - A(id1) - B(id2) - Y(id5) - C(id3) - [end]

Merge (compare by id, smaller id first):
  [start] - A(id1) - B(id2) - X(id4) - Y(id5) - C(id3) - [end]

Key: unique IDs + ordering rules enable deterministic merging.

3.3 Yjs's YATA Algorithm

Yjs uses a variant algorithm called YATA (Yet Another Transformation Approach). More efficient than RGA.

Each character carries:

Unique ID (client_id, clock)
origin_left: the ID of the left neighbor at insertion time
origin_right: the ID of the right neighbor at insertion time

Merge rules:

Inserts with the same origin are sorted by client_id
When origins differ, sort by positional information

Efficiency: Yjs compresses consecutive typing of the same string into a single object, giving 100x+ memory efficiency.

4. Yjs — The Standard JavaScript CRDT

4.1 Basic Usage

import * as Y from 'yjs'
import { WebrtcProvider } from 'y-webrtc'

// Create a shared document
const doc = new Y.Doc()

// Shared text
const ytext = doc.getText('shared-text')
ytext.insert(0, 'Hello, ')
ytext.insert(7, 'World!')
console.log(ytext.toString())  // "Hello, World!"

// P2P sync (WebRTC)
const provider = new WebrtcProvider('my-room', doc)

4.2 Various Data Types

// Text
const ytext = doc.getText('text')
ytext.insert(0, 'Hello')

// Array
const yarray = doc.getArray('list')
yarray.push(['item1', 'item2'])

// Map (object)
const ymap = doc.getMap('config')
ymap.set('theme', 'dark')

// XML/JSON
const yxml = doc.getXmlFragment('content')

4.3 Change Detection

ytext.observe((event) => {
  console.log('Changes:', event.changes.delta)
  // [{ retain: 7 }, { insert: 'World!' }]
})

4.4 Persistence — IndexedDB

import { IndexeddbPersistence } from 'y-indexeddb'

const persistence = new IndexeddbPersistence('my-doc', doc)
persistence.on('synced', () => {
  console.log('Loaded from IndexedDB')
})

State is preserved across browser reloads — truly local-first.

4.5 Multiple Providers

Yjs is transport-agnostic and supports multiple sync mechanisms:

y-websocket — centralized server sync
y-webrtc — P2P sync
y-indexeddb — local persistence
y-leveldb — Node.js server
Custom providers are easy to write

5. Automerge — JSON Data Model

5.1 JSON-Friendly Interface

import * as Automerge from '@automerge/automerge'

let doc = Automerge.init()
doc = Automerge.change(doc, 'Initial', d => {
  d.todos = []
  d.todos.push({ text: 'Buy milk', done: false })
})

// Another device
let doc2 = Automerge.merge(Automerge.init(), doc)
doc2 = Automerge.change(doc2, 'Add task', d => {
  d.todos.push({ text: 'Walk dog', done: false })
})

// Merge
const merged = Automerge.merge(doc, doc2)
console.log(merged.todos)  // [Buy milk, Walk dog]

5.2 Yjs vs Automerge

	Yjs	Automerge
Language	JavaScript	TypeScript + WASM
Data model	CRDT types (Y.Text, Y.Map)	JSON-like
Text performance	Very fast	Good
Memory efficiency	Excellent	Moderate
Learning curve	Medium	Low
Use cases	Collaborative editors	General data sync
Document size	Small	Moderate
Used by	Notion, Linear, Affine	Local-first apps

Selection guide:

Text collaboration (editors, notes): Yjs
JSON data (app state, forms): Automerge
Both under consideration: prototype faster with Automerge

6. The Local-First Software Movement

6.1 Seven Ideals

The Local-First Software manifesto by Ink & Switch:

Fast — no network round trips
Multi-device — sync in the background
Network-optional — fully works offline
Collaborate with others — conflict resolution via CRDT
Long-term preservation — data survives even if the cloud disappears
Security and privacy by default — data stays on user devices
Ultimate user control — data ownership

6.2 Cloud-First vs Local-First

	Cloud-First	Local-First
Data location	Server	Device
Offline	Does not work	Fully works
Response time	Network-dependent	Instant
Collaboration	Server-mediated	P2P or server
Company shutdown	Data lost	Data retained
Examples	Google Docs	Obsidian, Logseq, Linear

6.3 Local-First Adoption Cases

Linear — project management:

Full data in local IndexedDB
Instant response (latency 0)
Background sync via WebSocket
Offline changes queued, sent on reconnect

Figma — design collaboration:

Custom CRDT implementation (RGA-based)
Real-time multi-cursor
Offline editing then sync

Affine — notes:

Uses Yjs
Fully local-first
Cloud sync is optional

Notion — notes/wiki:

Partial CRDT (per block)
Migrating text editing from OT to CRDT

7. CRDT vs OT (Operational Transformation)

7.1 What is OT?

The traditional collaboration approach used by Google Docs. Operation transformation — transform concurrent operations to maintain consistency.

Initial: "ABC"

User A: insert(1, "X") -> "AXBC"
User B: delete(2) -> "AB"

Transform User B's op after User A's change:
  delete(2) -> delete(3) (position adjusted)

Result: "AXB"

7.2 Comparison

	OT	CRDT
Central server	Required	Optional
Offline support	Hard	Natural
Algorithmic complexity	Very complex	Complex (but verifiable)
P2P	Hard	Natural
Google Docs	Yes (current)	No
Figma, Linear	No	Yes
Academic research	since 1990s	since 2000s

Trend: OT -> CRDT migration (Notion, Confluent, etc.). CRDTs fit distributed environments better.

8. CRDT Limitations and Pitfalls

8.1 Metadata Explosion

Problem: tombstones, tags, clocks, etc. accumulate so that document size exceeds the actual content.

Solutions:

Compaction: clean up metadata no longer needed
Delta compression: used by Yjs. Consecutive ops merged into a single object
Checkpoints: periodic baselines

8.2 Semantic Conflicts

CRDTs resolve syntactic conflicts (concurrent edits at the same location). Semantic conflicts are not resolved.

Example: a calendar app. Two users simultaneously book the same meeting room for different meetings. CRDT treats both bookings as successful — a business-rule violation.

Solution: add a business-logic layer on top of the CRDT. Or, for parts needing strong consistency, use a different mechanism.

8.3 Partial CRDTs Are More Practical

Notion's approach: CRDT at the block level, normal text inside blocks. Not everything needs to be a CRDT.

Linear's approach: only certain fields are CRDTs, others use LWW. A balance of simplicity and performance.

9. Hands-On — Building a Collaborative Text Editor

9.1 Basic Structure

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { EditorView, basicSetup } from 'codemirror'
import { yCollab } from 'y-codemirror.next'

// 1. Create Yjs document
const ydoc = new Y.Doc()
const ytext = ydoc.getText('codemirror')

// 2. Sync provider
const provider = new WebsocketProvider(
  'wss://my-server.com',
  'document-id-123',
  ydoc
)

// 3. CodeMirror editor + Yjs integration
const view = new EditorView({
  doc: ytext.toString(),
  extensions: [
    basicSetup,
    yCollab(ytext, provider.awareness)  // collaboration extension
  ],
  parent: document.body
})

In under 100 lines of code, you get a Google Docs-style collaborative editor.

9.2 User Awareness

provider.awareness.setLocalStateField('user', {
  name: 'Alice',
  color: '#ff0000'
})

provider.awareness.on('change', () => {
  const users = Array.from(provider.awareness.getStates().values())
  console.log('Online users:', users)
})

Other users' cursors and selections are displayed automatically.

Quiz

1. What are the three mathematical properties of a CRDT?

Answer: (1) Associativity: (a + b) + c = a + (b + c), (2) Commutativity: a + b = b + a, (3) Idempotence: a + a = a. If all three hold, merging in any order yields the same result. This is the ACI property or semilattice. It is the mathematical foundation of CRDTs' automatic conflict resolution.

2. What is the difference between State-based and Operation-based CRDTs?

Answer: State-based (CvRDT) transmits full state and combines via a merge function. Message order doesn't matter; simple. Downside: large state equals high network cost. Operation-based (CmRDT) transmits only operations. Smaller messages, but requires reliable delivery (at-least-once + idempotence). Practical libraries (Yjs, Automerge) use a hybrid approach.

3. Should you choose Yjs or Automerge?

Answer: Text collaboration (editors, notes, IDEs) -> Yjs (superior performance and memory efficiency; used by Notion, Linear, Affine). JSON data sync (app state, forms, settings) -> Automerge (JSON-like API, low learning curve). If both are on the table, a common pattern is to prototype quickly with Automerge and optimize for production with Yjs.

4. What are the core values of Local-First Software?

Answer: Ink & Switch's seven ideals: (1) fast (no network dependency), (2) multi-device, (3) network-optional (fully offline-capable), (4) collaboration, (5) long-term preservation (survives cloud shutdown), (6) security/privacy (data on device), (7) user control of data. Cloud-first: data disappears if the company shuts down. Local-first: data remains in user hands. Adopted by Linear, Figma, Notion.

5. What conflicts can CRDTs not resolve?

Answer: Semantic conflicts. CRDTs automatically merge concurrent changes to the same data structure, but cannot prevent business-rule violations. Example: two users book the same meeting room for different meetings at the same time -> CRDT treats both as successful. Solution: add a business-logic layer on top of the CRDT, or use a separate mechanism for parts needing strong consistency (central validation, distributed locks, etc.).

References

CRDT.tech — comprehensive CRDT resources
Local-First Software — Ink & Switch manifesto
Yjs — official docs
Automerge — official docs
Conflict-Free Replicated Data Types — the original paper (Shapiro 2011)
A Conflict-Free Replicated JSON Datatype — Automerge paper
YATA: Yet Another Transformation Approach — Yjs algorithm
Designing Data-Intensive Applications — Martin Kleppmann (CRDT chapter)
Linear's offline mode — Linear sync engine
Figma's multiplayer technology — Figma collaboration
Diamond Types — high-performance CRDT written in Rust