Skip to content

Design a Collaborative Text Editor

TL;DR

You are designing Google Docs -- specifically, the real-time collaboration piece where multiple people type in the same document simultaneously and everyone's cursor, edits, and formatting appear on everyone else's screen in near-real-time.

The core algorithmic challenge is conflict resolution: what happens when User A inserts "hello" at position 5 and User B deletes the character at position 5 at the same time? The answer is either Operational Transformation (OT) -- what Google Docs actually uses -- or Conflict-Free Replicated Data Types (CRDTs) -- what Figma uses. You need to understand both, pick one, and defend it.

The infrastructure challenge is deceptively simple once you solve the algorithm. Each active document lives on exactly one server. That server holds the document in memory, accepts operations from connected clients, transforms them, and broadcasts the results. The hard part is not scaling to many simultaneous editors (Google Docs caps at ~100 per document) -- it is scaling to millions of simultaneously active documents.


The System

A collaborative text editor allows multiple users to edit the same document concurrently. Each user sees their own edits applied instantly (optimistic local apply). Other users' edits appear within 100-300ms. All users converge to the same final document state regardless of the order edits arrive.

The system supports:

  • Real-time text insertion and deletion across concurrent editors.
  • Cursor and selection presence (see where others are typing).
  • Document persistence with full edit history.
  • Offline tolerance for brief disconnections (seconds to minutes).
  • Rich text formatting (bold, italic, headings, lists -- discuss complexity but scope down for the interview).

Real-world references:

  • Google Docs: Uses OT (Jupiter protocol, originally from Xerox PARC 1995). Limits to ~100 concurrent editors per document. The server is the source of truth.
  • Figma: Uses CRDT-inspired techniques with a central server. LWW (last-writer-wins) registers per property. Not a text editor but proves CRDTs work at production scale.
  • Yjs: Open-source CRDT library. Benchmarks show CRDTs matching or beating OT performance, contradicting the old "CRDTs are too slow" conventional wisdom.

Requirements

Functional Requirements

  1. Real-time collaborative editing -- multiple users type simultaneously; all converge to the same state.
  2. Cursor presence -- see other users' cursor positions and selections.
  3. Document persistence -- documents are saved durably and survive server restarts.
  4. Edit history -- view previous versions, undo/redo in a collaborative context.
  5. Access control -- owner, editor, viewer, link-sharing permissions.

Non-Functional Requirements

  • Latency (local edit): 0ms -- edits appear instantly in the editor's own view (optimistic local apply).
  • Latency (remote edit): < 300ms -- other users' edits appear within 300ms.
  • Consistency: Strong eventual consistency -- all editors who have received the same set of operations are in the same document state, regardless of the order they arrived.
  • Concurrent editors per document: Up to 100 (Google Docs' real limit).
  • Scale: Billions of documents stored, millions accessed daily, thousands concurrently active.

Back-of-Envelope Math

Metric Value Derivation
Total documents 5B Google Workspace scale
Documents accessed daily 50M ~1% of total
Concurrently active documents 500K ~1% of daily accessed
Average concurrent editors per document 2-3 Most docs have 1-3 editors
Operations per second per document 10-50 2-3 editors x 5-10 ops/sec each
Total operations per second 5M-25M 500K docs x 10-50 ops/sec
WebSocket connections per server 50K-100K Standard
Editors per active document on one server ~100 All go to the same server
Active documents per server 500-1,000 50K connections / ~100 editors per doc
Document servers needed 500-1,000 500K active docs / 500-1,000 per server
Avg document size 50 KB Typical Google Doc
Memory per document 200 KB - 1 MB Document + OT/CRDT state + op buffer
Memory per server for documents ~500 MB - 1 GB 1,000 docs x 500 KB-1 MB

The memory requirement per server is tiny. The bottleneck is not memory or CPU -- it is routing each editor to the correct document server.


Naive Design

User A types "hello" at position 5
  -> Client sends {type: "insert", position: 5, text: "hello"} to server
  -> Server applies to stored document
  -> Server broadcasts to User B
  -> User B applies at position 5

User B simultaneously deletes character at position 5
  -> Client sends {type: "delete", position: 5} to server
  -> Server applies to stored document
  -> Server broadcasts to User A
  -> User A applies delete at position 5

Operations are applied in the order they arrive at the server. No transformation. First come, first served.


Where It Breaks

Position invalidation. User A inserts "hello" (5 characters) at position 5. User B, who has not yet received A's edit, deletes the character at position 5 in their local view. But on the server, A's insert shifted everything. The character B intended to delete is now at position 10, not position 5. B's delete hits the wrong character.

This is not an edge case. With two users typing in different parts of a document, virtually every operation will have an incorrect position by the time it reaches the server. The document diverges immediately and never recovers.

No local responsiveness. If the client waits for the server to confirm each keystroke before displaying it, typing feels laggy (100-300ms delay per character). Users expect zero-latency local editing. But if the client applies locally and sends to the server, the client's state diverges from the server's state. Reconciling the two is the entire problem.

No convergence guarantee. Without a formal algorithm, there is no proof that all clients end up with the same document. Testing is insufficient -- collaborative editing bugs are notoriously hard to reproduce because they depend on precise timing of concurrent operations.


Real Design

Collaborative Editor — High-Level Design

The Algorithm: Operational Transformation (OT)

OT is the proven choice for server-based collaborative editing. Google Docs uses it. Here is how it works.

Operations for plain text:

  • insert(position, text) -- insert text at a position.
  • delete(position, count) -- delete count characters starting at position.

The transform function: Given two concurrent operations op1 and op2 generated against the same document state S, the transform function produces modified operations op1' and op2' such that:

apply(apply(S, op1), op2') = apply(apply(S, op2), op1')

Both paths produce the same final state. This is called Transformation Property 1 (TP1).

The four transform cases:

Case 1: insert vs insert.

  • op1 = insert(3, "a") and op2 = insert(5, "b") (both against same state).
  • op1' = insert(3, "a") (unchanged -- op2's insert is after op1's position).
  • op2' = insert(6, "b") (shifted right by 1 because op1 inserted before position 5).

Case 2: insert vs delete.

  • op1 = insert(3, "a") and op2 = delete(5, 1).
  • op1' = insert(3, "a") (unchanged).
  • op2' = delete(6, 1) (shifted right by 1 because op1 inserted before position 5).

Case 3: delete vs insert.

  • op1 = delete(3, 1) and op2 = insert(5, "b").
  • op1' = delete(3, 1) (unchanged).
  • op2' = insert(4, "b") (shifted left by 1 because op1 deleted before position 5).

Case 4: delete vs delete at same position.

  • op1 = delete(3, 1) and op2 = delete(3, 1).
  • Both are deleting the same character. One becomes a no-op.

Each transform is simple position arithmetic. The tricky part is not any individual transform -- it is ensuring that the transform function is applied correctly in the right order across the full history of concurrent operations.

Server-Based OT: The Jupiter Protocol

Google Docs uses the Jupiter protocol (from Xerox PARC, 1995). The central server simplifies OT dramatically.

How it works:

  1. Each client maintains a local version counter: how many of its own operations the server has acknowledged.
  2. Client applies its own edits optimistically -- immediately to local state, no waiting for the server.
  3. Client sends the operation to the server.
  4. Server maintains a total-ordered operation log. When the server receives a client's operation, it transforms it against any operations the client has not yet seen (operations from other clients that arrived between the client's last acknowledged state and now).
  5. Server applies the transformed operation to its document state and appends it to the operation log.
  6. Server broadcasts the (possibly transformed) operation to all other clients.
  7. When a client receives a remote operation from the server, it transforms the remote operation against any local operations that have not yet been acknowledged by the server.

Why TP1 is sufficient: The server provides a single total ordering of operations. This reduces the problem to a 2D state space (client ops vs server ops) at each client. Only pairs of operations need to be transformed, never triples or more. The general peer-to-peer case requires TP2 (associative transformation), which is mathematically much harder and has a history of published algorithms being proven incorrect.

Key insight for interviews: TP1 with a central server is a solved problem. TP2 for peer-to-peer OT is an unsolved problem in practice -- at least 8 published algorithms have been proven buggy after publication. This is why practical OT systems always use a central server.

Client-Side OT: Why It Matters

A common source of confusion: why does the client need to run OT, not just the server?

Scenario:

  1. User A types "X" at position 10. Client A applies locally. Document on Client A now has "X" at position 10.
  2. Client A sends insert(10, "X") to server. The operation is in flight.
  3. Meanwhile, the server broadcasts User B's operation insert(5, "Y") to Client A.
  4. Client A receives insert(5, "Y") from the server. But Client A's local state already has "X" at position 10 (which the server does not know about yet).
  5. If Client A naively applies insert(5, "Y"), it would insert "Y" at position 5 in a document that has "X" at position 10 from the local unacknowledged edit. Is position 5 correct?

No. Client A must transform the remote operation against its unacknowledged local operation. Since the local insert(10, "X") is at position 10 and the remote insert(5, "Y") is at position 5, and 5 < 10, the remote operation is unchanged. But if the positions were reversed, the remote operation would need to be shifted.

Without client-side OT, clients see incorrect intermediate states and may not converge.

Architecture

Clients (Web / Mobile)
    |
    | WebSocket (persistent)
    v
API Gateway / Load Balancer
    |
    | Route based on document_id (consistent hash)
    v
Document Server (one per active document)
    |
    +-> Holds document in memory
    +-> Maintains operation log (recent ops for transformation)
    +-> WebSocket connections to all editors of this document
    +-> Cursor/selection presence (ephemeral, in memory)
    |
    +-> Writes to durable storage:
    |     +-> Operation Log (Cassandra / DynamoDB)
    |     +-> Periodic Snapshots (S3 / blob storage)
    |
    v
Storage Layer
    +-> Operation Log DB: append-only log of all operations
    +-> Snapshot Store: periodic full-document snapshots
    +-> Metadata DB: document metadata, permissions, sharing settings

Document Server ownership: Each active document lives on exactly one Document Server, determined by consistent hashing on document_id. All editors of the same document connect to the same server. This is not optional -- server-based OT requires a single serialization point.

Why Same-Server Affinity Is Non-Negotiable

Unlike chat systems (where users on the same channel can connect to different servers and a pub/sub layer bridges them), collaborative editing with OT requires all users on the same document to connect to the SAME server instance. OT transform ordering must be centralized -- if two servers both accept operations and apply transforms independently, their document states diverge permanently. There is no "merge later" with OT (that would require TP2, which is unsolved). Use consistent hashing on document_id to route all WebSocket connections for a document to one server. If that server fails, a new server takes over the hash range, loads the snapshot + operation log from storage, and replays to reconstruct state. Clients reconnect and retransmit unacknowledged operations. The critical invariant: at any given moment, exactly one server is the authority for a document's operation ordering.

Routing flow:

  1. Client hits the API Gateway with document_id.
  2. Gateway computes the hash and routes to the correct Document Server.
  3. If the Document Server does not currently have this document loaded, it loads the operation log from storage, reconstructs the document state, and accepts the WebSocket connection.
  4. Subsequent edits go directly over the WebSocket to this Document Server.

Why one server per document works: A fast typist generates ~5-10 ops/second. With 100 editors, that is ~500-1,000 ops/second per document. A single server handles this trivially. The challenge is not per-document throughput -- it is routing millions of documents to the right servers.

Persistence: Operation Log + Snapshots

There is no "document file" stored anywhere. This is a point of confusion for many candidates. The document is the result of replaying an ordered log of operations.

Pattern: This is identical to a database's write-ahead log (WAL) + checkpoints.

  1. Every operation is appended to the operation log in durable storage (Cassandra, DynamoDB) before being acknowledged to the client. This ensures no acknowledged operation is ever lost.
  2. Periodically (every N operations or every T minutes), the Document Server writes a full snapshot of the current document state to blob storage (S3).
  3. To reconstruct a document: load the most recent snapshot, then replay all operations logged after that snapshot.

Why not just store the current document? Because you need edit history. The operation log IS the history. You can reconstruct the document at any point in time by replaying operations up to that timestamp. This enables "version history," "see who changed what," and "revert to a previous version."

Compaction: Over time, the operation log grows. A 50 KB document that has been heavily edited for a year might have a 50 MB operation log. Compaction collapses the entire log into a single "insert the whole document" operation plus a snapshot. Old operations are archived or deleted. This runs as a background process.

Cursor and Selection Presence

Cursor positions are ephemeral state -- never persisted, stored in memory on the Document Server, and broadcast via the same WebSocket connection.

Implementation:

  1. Client sends {type: "cursor", position: 42, selectionEnd: 50, userId: "abc"} over WebSocket.
  2. Document Server stores this in an in-memory map: {userId -> cursorState}.
  3. Document Server broadcasts cursor updates to all other connected clients.
  4. Client renders colored cursors and selection highlights.

Cursor positions must be transformed too. When User A's insert shifts text, User B's cursor position (which is a position in the document) may need to shift. The Document Server transforms cursor positions using the same position-shifting logic as operations.

Optimization: Cursor updates are sent at most every 50-100ms (debounced), not on every keystroke. This reduces traffic significantly. Clients interpolate cursor positions between updates for smooth visual movement.


Deep Dives

Collaborative Editor — OT vs CRDT

Deep Dive 1: OT vs CRDTs -- The Real Tradeoff

This is the question interviewers love, and the nuanced answer is better than dogmatically picking one.

OT (Google Docs' choice):

Strength Detail
Proven at massive scale Google Docs has billions of documents, ~100 editors/doc
Lower memory overhead Operations are lightweight (position + content). No per-character metadata.
Compactable Operation log can be collapsed to a snapshot.
Rich text support Mature. Google has handled ~20+ operation types with hundreds of transform function pairs.
Weakness Detail
Requires a central server Cannot do true peer-to-peer
Correctness is hard At least 8 published OT algorithms proven incorrect after publication
Offline support is poor Extended offline causes massive divergence; transform backlog on reconnect
TP2 is unsolved in practice Only server-based (TP1) works reliably

CRDTs (Figma's choice, Yjs):

Strength Detail
Formally proven convergence Mathematical guarantee, not testing-dependent
Offline support is native Operations are commutative; buffer locally, replay on reconnect
No central server required Can work peer-to-peer (though production systems still use a server)
Implementation correctness No TP2 puzzle. Convergence is a property of the data structure.
Weakness Detail
Tombstone accumulation Deleted characters are marked, not removed. Document metadata only grows.
Per-character metadata overhead Each character carries a unique ID (8-16 bytes). A 100 KB document might have 500 KB-2 MB of CRDT metadata.
Rich text is less mature Peritext (Ink & Switch) is research-stage. Production rich text CRDTs are emerging but not as battle-tested.
Garbage collection is hard Reclaiming tombstones requires coordination, partially defeating the "no coordination" advantage.

The Yjs benchmark that challenges conventional wisdom:

Kevin Jahns (Yjs creator) benchmarked against a real-world editing trace of 260,000 operations producing a 100,000-character document:

Metric Yjs (CRDT) Automerge (old CRDT)
Apply 260K ops ~0.5 sec ~300 sec
Encoded document size ~160 KB ~80 MB
Memory usage ~3 MB ~800 MB

Yjs proves that CRDTs can be competitive with OT in practice. The old "CRDTs are too slow" argument is based on naive implementations, not the state of the art.

Figma's pragmatic approach:

Figma's CTO Evan Wallace: "CRDTs are designed for decentralized systems where there is no single source of truth. We actually have a central server. We just need the property that operations are commutative."

Figma uses CRDT-inspired techniques with a central server. Their data model is objects with properties (not linear text), so each property is a simple LWW (last-writer-wins) register. When two users simultaneously change a rectangle's X position, the latest value wins. This is the simplest possible CRDT.

The real decision axis is the data model, not OT vs CRDT:

  • Linear text sequence: OT (proven, Google Docs) or sequence CRDT (Yjs/YATA, increasingly viable).
  • Object properties: LWW CRDT (Figma).
  • Tree structure: Tree CRDT (emerging research).

For a Google Docs interview: Pick OT. It matches what Google actually uses. Mention Figma and Yjs as evidence that CRDTs work at scale, but explain why OT is the right choice for a centralized text editing service.

Deep Dive 2: Document Server Failure and Recovery

The Document Server is stateful -- it holds the document in memory plus WebSocket connections to all editors. If it crashes, everyone editing that document loses their connection. Recovery must be seamless.

Recovery flow:

  1. Document Server crashes. All connected clients lose their WebSocket connections.
  2. Consistent hash ring manager detects the failure and assigns the document's hash range to a replacement server.
  3. Replacement server loads the most recent snapshot from S3 and replays the operation log from Cassandra to reconstruct the document state.
  4. Clients detect disconnect, reconnect through the API Gateway, which routes them to the new Document Server.
  5. Clients send any unacknowledged local operations to the new server. The server transforms and applies them.

What is safe and what is not:

  • Acknowledged operations are safe. They were written to the durable operation log before the ACK was sent to the client. The new server will replay them.
  • Unacknowledged operations need retransmission. The client maintains a buffer of operations it sent but has not received an ACK for. On reconnect, it retransmits these. The server transforms them against any operations it has from other clients.
  • The critical invariant: Write to durable storage BEFORE acknowledging. This is the same principle as Lesson 1's store-and-forward model.

Recovery time: Loading a snapshot + replaying recent ops typically takes 1-5 seconds for a typical document. Clients experience a brief "reconnecting..." state. No edits are lost.

Deep Dive 3: Geo-Distribution and Latency

All editors of a document connect to the same server. If User A is in Tokyo and User B is in London and the Document Server is in Virginia, User A has ~200ms round-trip latency.

Why this is acceptable:

  • Local edits are instant. User A sees their own keystrokes immediately (optimistic local apply). The 200ms latency only affects how quickly User A sees User B's edits and vice versa.
  • Collaborative editing is human-speed. A 200ms delay in seeing someone else's cursor move is barely perceptible. Compare to video calling, where 200ms is noticeable.
  • Google Docs enforces this in production. All editors connect to the same server. Geographic latency is a conscious tradeoff for architectural simplicity.

Mitigation strategies:

  1. Document placement heuristic: Place the document on a server geographically close to the majority of active editors. If most editors are in Europe, use the EU region. With consistent hashing, this requires per-document overrides or region-aware hashing.
  2. Read-only viewers via CDN: Users with view-only access do not need real-time OT. Serve them a periodically-updated snapshot via CDN. Only active editors need the WebSocket connection to the Document Server.
  3. Operation batching: Instead of sending each keystroke individually, batch operations client-side (e.g., every 50-100ms) and send as a group. This reduces the number of round-trips without noticeably increasing perceived latency.

What about multi-region OT? Theoretically, you could run OT servers in multiple regions with cross-region synchronization. In practice, this requires TP2 (peer-to-peer OT), which is unsolved. Google chose the simpler approach: one server, accept the latency.


Alternative Designs

CRDTs Instead of OT

If the interviewer pushes for CRDTs, here is the architecture:

  • Use a sequence CRDT (Yjs/YATA or RGA) for the document data structure.
  • Each character has a unique ID: (lamport_timestamp, client_id).
  • Insertions reference the ID of the character they are inserted after.
  • Deletions mark characters as tombstones.
  • The server is a relay and persistence layer -- it does not run transform logic.
  • Convergence is guaranteed by the mathematical properties of the data structure.

Trade-offs vs OT for this specific problem:

  • You gain offline editing support (operations are commutative, can be buffered and replayed).
  • You gain simpler server logic (relay, not transformer).
  • You lose compactability (tombstones accumulate unless garbage collected).
  • You gain a richer identity model (each character has a globally unique ID, enabling precise conflict resolution).

For a Google Docs clone, OT is the pragmatic choice. For a system that needs strong offline support (mobile-first document editor, P2P collaboration), CRDTs are the better foundation.

Peer-to-Peer Architecture (No Central Server)

Use WebRTC data channels for direct client-to-client communication. A lightweight signaling server handles initial peer discovery.

Problems:

  • Scales to ~5-10 concurrent editors before the full mesh (O(N^2) connections) becomes expensive.
  • No centralized access control. Any client can forge operations.
  • No centralized persistence. Clients must coordinate who saves the document.
  • No total ordering without a central server (requires TP2, which is unreliable).

Where it works: Small-team tools like Excalidraw or Conclave that prioritize privacy and simplicity over scale. Not suitable for Google Docs scale.

Event Sourcing as the Persistence Model

The operation-log approach is essentially event sourcing:

  • The document state is derived from an ordered log of events (operations).
  • The current state can be reconstructed by replaying all events from the beginning.
  • Snapshots are periodic checkpoints to avoid replaying the entire history.
  • The operation log is the source of truth; the current document is a derived view.

This is exactly how database WALs work. If the interviewer is familiar with event sourcing, draw the parallel explicitly -- it shows breadth.


Scaling Math

Document Server Fleet

Metric Calculation Result
Concurrently active documents -- 500K
Avg editors per document -- 2-3
Connections per server -- 50K
Active documents per server 50K / (2-3) ~17K-25K
Document Servers needed 500K / 20K ~25

25 Document Servers for 500K concurrently active documents. The fleet is small because each document is lightweight in memory and generates minimal CPU load.

However, this assumes average. Hotspots matter: a viral shared document with 100 editors creates more load than 50 documents with 2 editors each. Size the fleet for peak, not average.

Operations Throughput

Metric Calculation Result
Ops per second per document 2-3 editors x 5-10 ops/sec 10-30 ops/sec
Transform cost per op O(concurrent_unacked_ops) Typically O(1-5) transforms
CPU per document Negligible --
Total ops per second 500K docs x 20 ops/sec 10M ops/sec total
Per server 10M / 25 400K ops/sec

400K transforms per second per server is achievable. Each transform is simple position arithmetic -- a few microseconds. The bottleneck is network I/O (WebSocket management), not CPU.

Storage

Metric Calculation Result
Avg document size -- 50 KB
Operation log before compaction ~10x document size ~500 KB
Operation log after compaction ~1.5x document size ~75 KB
Total documents -- 5B
Cold storage (compacted) 5B x 75 KB 375 TB
Hot storage (active docs) 500K x 500 KB 250 GB

375 TB of cold storage is manageable with S3 or equivalent blob storage. 250 GB of hot storage fits in a small database cluster.

Memory Per Server

Metric Calculation Result
Documents per server ~20K --
Memory per document ~200 KB - 1 MB Document + OT state + op buffer + cursors
Total memory per server 20K x 500 KB (avg) ~10 GB

10 GB of document state per server. Modern servers have 64-256 GB RAM. Plenty of headroom.


Failure Analysis

Document Server Crashes

Impact: All editors of documents hosted on that server lose their WebSocket connections. ~20K documents affected.

Recovery: 1. Hash ring manager reassigns the document range to a replacement server. 2. Replacement loads snapshots + replays operation logs. Time: 1-5 seconds per document (parallelized). 3. Clients reconnect through API Gateway, routed to replacement. 4. Unacknowledged operations are retransmitted by clients and transformed by the new server.

Data loss: None, assuming the invariant "write to durable storage before ACK" is maintained. Unacknowledged operations are retransmitted by the client.

Operation Log Database Failure

Impact: New operations cannot be persisted. The Document Server can continue operating in memory (accepting and transforming operations) but cannot acknowledge them as durable.

Mitigation: - Buffer operations in memory on the Document Server. Accept edits but do not send durable ACKs. - When the database recovers, flush the buffer. - If the Document Server also crashes during the database outage, buffered operations are lost. This is the catastrophic scenario -- mitigate by writing operations to a local WAL on disk as a backup.

Network Partition Between Client and Server

Impact: Client cannot send operations to the server. Local edits accumulate as unacknowledged operations.

OT handling: - Client continues accepting local edits (optimistic local apply). - Operations are buffered for transmission. - On reconnect, all buffered operations are sent to the server. - Server transforms them against operations from other clients that arrived during the partition. - For short partitions (seconds to minutes), this works well. For long partitions (hours), the transform backlog can be large and the user may see surprising document changes on reconnect.

CRDT advantage: CRDTs handle this more gracefully because operations are commutative. There is no transform backlog -- just apply all buffered operations in any order.

Two Clients Believe They Own the Same Document (Split Brain)

Scenario: Network partition causes the hash ring to assign the same document to two different Document Servers. Both accept edits.

Recovery: 1. When the partition heals, one server is chosen as authoritative (the one with the higher operation count, or arbitrary tiebreaker). 2. The other server's operations are replayed against the authoritative server's state using OT transforms. 3. Clients connected to the non-authoritative server receive the reconciled state.

This is rare (requires hash ring inconsistency during a network partition) but must be handled. The operation log in durable storage provides the ground truth for reconciliation.


Level Expectations

Level What You Must Cover What Sets You Apart
Mid-Level WebSocket for real-time communication, identify the core problem (concurrent edits with position invalidation), mention OT or CRDTs as the solution, document persistence, cursor presence as ephemeral state Explains optimistic local apply (edit instantly, send to server, transform remote ops). Clean architecture separating API Gateway from Document Server.
Senior OT transform function with at least 2 of the 4 cases worked through, server-based OT (Jupiter protocol) with TP1, consistent hashing to route editors to the same server, operation log + snapshots for persistence, cursor position transformation Explains why TP1 is sufficient with a central server but TP2 is needed for P2P. Discusses the "no document file" insight (document = replay of operation log). Calculates that 100 editors at 10 ops/sec = 1,000 ops/sec per document, trivially handled by one server.
Staff+ All 4 transform cases, OT vs CRDT tradeoff with specifics (Figma's LWW registers, Yjs benchmarks), document server failure and recovery (snapshot + op log replay + client retransmission), geo-distributed latency analysis (all editors on one server, accept 200ms cross-region latency), rich text complexity acknowledgment Knows Google Docs limits to ~100 concurrent editors and explains why (OT transform backlog grows with more concurrent unacked ops). Mentions that at least 8 published OT algorithms were proven buggy, so TP2 is unsolved. Cites Yjs benchmarks showing CRDTs are competitive. Discusses undo in collaborative editing as a hard open problem.

References from Our Courses


Red Team This Design

Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.

Attack: Design a Collaborative Text Editor →