Design a Cloud File Sync Service
This question separates candidates who have actually thought about distributed systems from candidates who memorized a CRUD API template. The URL shortener is a database problem. This is a synchronization problem -- and synchronization is where things get genuinely hard.
The System
A cloud file sync service (think Dropbox, Google Drive, OneDrive) stores files in the cloud and keeps them synchronized across all of a user's devices. Edit a spreadsheet on your laptop, and within seconds it appears on your phone. Two people sharing a folder both see changes in near-real-time.
Dropbox has over 700 million registered users and stores exabytes of data. They migrated off AWS S3 to their own custom storage system (Magic Pocket) because at their scale, building and operating custom storage was cheaper by tens of millions of dollars per year. Before that migration, they were one of S3's largest customers. The engineering team that built the initial system that served 30 million users? Four people. The system that powered the $10B IPO was fundamentally the same architecture scaled up -- chunking, content-addressable storage, and metadata separation. These decisions made at the start carried the company for a decade.
Requirements
Functional
- Upload files: Users can upload files of any size (1 KB to 50 GB) from any device
- Download files: Users can download any file they have access to
- Sync across devices: Changes on one device propagate to all other devices automatically
- File sharing: Users can share files/folders with other users with view or edit permissions
- Version history: Users can view and restore previous versions of a file
Non-Functional
- Sync latency: Changes appear on other devices within 5 seconds for files under 10 MB on a stable connection
- Upload efficiency: Editing a single paragraph in a 100 MB document should not re-upload 100 MB (delta sync)
- Durability: 99.999999999% (eleven nines) -- losing a user's file is an extinction-level event for a storage company
- Availability: 99.9% uptime -- users tolerate brief outages but not data loss
- Scale: 500 million users, 100 million daily active, average 2 GB stored per user
Back-of-Envelope Math
Storage:
500M users * 2 GB average = 1 exabyte total storage
With deduplication (20-30% savings): ~700-800 PB effective
S3 cost at $0.023/GB/month: $23M/month = $276M/year
This is why Dropbox built Magic Pocket.
Upload traffic:
100M DAU, assume 10% modify files daily = 10M file changes/day
Average file size: 5 MB
10M * 5 MB = 50 TB of new/changed data per day
With delta sync (60% savings): ~20 TB actual upload per day
Metadata:
500M users * 200 files average = 100 billion file metadata records
Each record: ~500 bytes (path, size, timestamps, blocklist hash, permissions)
Total metadata: ~50 TB
This fits in a sharded database cluster.
Sync notifications:
10M file changes/day = ~116 changes/sec average
Each change notifies an average of 3 devices per user = 348 notifications/sec
Plus shared folder notifications: ~500 notifications/sec total
With burst (10x): ~5,000 notifications/sec
Chunk storage:
Average file: 5 MB = ~1-2 chunks at 4 MB target chunk size
100 billion files * 1.5 chunks = 150 billion chunks
Block index entry: ~100 bytes per chunk
Block index size: 15 TB
The Naive Design
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐
│ Client │────>│ API Server │────>│ PostgreSQL │ │ S3 │
│ (laptop) │<────│ │<────│ (metadata) │ │ (files) │
└──────────┘ └──────────────┘ └──────────────┘ └──────────┘
Upload:
1. Client uploads entire file to API server
2. Server stores file in S3
3. Server writes metadata (name, size, S3 key) to Postgres
4. Server returns success
Download:
1. Client requests file
2. Server generates presigned S3 URL
3. Client downloads directly from S3
Sync:
1. Client polls every 30 seconds: "anything new since timestamp X?"
2. Server queries: SELECT * FROM files WHERE user_id = ? AND updated_at > ?
3. Client downloads changed files
This works for a homework project. It does not work for Dropbox.
Where Does This Break First?
Editing a single cell in a 500 MB Excel file re-uploads the entire 500 MB. A user on a train with spotty WiFi uploads 400 MB, loses connection, and has to start the entire upload over. Two people editing the same shared document create a race condition where the last upload silently overwrites the other person's changes. And polling every 30 seconds means your coworker waits half a minute to see your changes.
Where It Breaks
Three things break simultaneously, and each one is a hard problem on its own:
1. Bandwidth waste. Without chunking and delta sync, every file modification re-uploads the entire file. A 1 KB edit in a 1 GB video re-uploads the full gigabyte. Multiply this by millions of users and you are burning bandwidth (and money) at an absurd rate.
2. No resumability. Without chunking, an interrupted upload of a large file means starting over from byte zero. On unreliable networks (mobile, public WiFi), large file uploads may never complete.
3. Conflict blindness. Without versioning and conflict detection, concurrent edits to the same file result in silent data loss. The last writer wins, and nobody knows the first writer's changes existed.
The Real Design
┌────────────────────────────────────────────────────────────────────┐
│ Client (Desktop App) │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────────────┐ │
│ │ File Watcher │ │ Chunking │ │ Local Metadata DB │ │
│ │ (FSEvents/ │──│ Engine │ │ (SQLite: file states, │ │
│ │ inotify) │ │ (Rabin CDC) │ │ chunk hashes, cursors) │ │
│ └─────────────┘ └──────────────┘ └───────────────────────────┘ │
└───────────┬──────────────────────────────────────┬────────────────┘
│ upload new chunks │ sync notifications
v v
┌──────────────┐ ┌──────────────────┐
│ Block Server │ │ Notification │
│ (upload/ │ │ Server │
│ download │ │ (WebSocket + │
│ chunks) │ │ long-poll │
└──────┬───────┘ │ fallback) │
│ └────────┬─────────┘
v │
┌──────────────┐ ┌──────────────┐ │
│ S3 / Blob │ │ Metadata │<────────┘
│ Store │ │ Service │
│ (chunks by │ │ │
│ SHA-256) │ └──────┬───────┘
└──────────────┘ │
v
┌──────────────┐ ┌──────────────┐
│ MySQL │ │ Redis │
│ (sharded │ │ (cursors, │
│ metadata) │ │ sessions) │
└──────────────┘ └──────────────┘
Upload Path (The Interesting Part)
-
File watcher detects a change. The OS-level file watcher (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) fires when a user saves a file.
-
Client chunks the file using content-defined chunking (CDC). This is not fixed-size chunking. A Rabin fingerprint rolls over the file's bytes, and chunk boundaries are placed where the fingerprint meets a condition (e.g., lowest 22 bits are zero). Target chunk size: ~4 MB. This produces variable-size chunks whose boundaries are determined by content, not position.
-
Client computes SHA-256 of each chunk. These hashes are the chunks' identities.
-
Client sends the new blocklist (list of chunk hashes) to the metadata service. "Here are the hashes for the new version of
report.docx." -
Metadata service compares against the stored blocklist. "You have 12 chunks. I already have 10 of them. Upload chunks 3 and 7."
-
Client uploads only the new/changed chunks to the block server, which stores them in S3 (or equivalent blob store) keyed by their SHA-256 hash.
-
Metadata service commits the new version. Updates the file's blocklist in the database, increments the file version, writes a journal entry with a new cursor.
-
Notification server pushes the change to all other devices subscribed to this namespace (the user's files or the shared folder).
Download/Sync Path
- Other devices receive a notification: "file X changed, new cursor is 4827."
- Device fetches journal entries since its last cursor from the metadata service.
- For each changed file, device receives the new blocklist.
- Device checks which chunks it already has locally (from previous versions of this file, or from other files).
- Device downloads only the missing chunks from the block server.
- Device reconstructs the file locally from the blocklist of chunks.
- Device updates its local cursor to 4827.
Why Content-Defined Chunking Matters
This is the technical detail that separates a good answer from a great one.
Fixed-size chunking divides a file into equal blocks (e.g., every 4 MB). The problem: inserting a single byte at the beginning of the file shifts every subsequent chunk boundary. Every chunk looks different. The entire file must be re-uploaded.
Content-defined chunking uses a rolling hash (Rabin fingerprint) to determine boundaries based on the content itself. Inserting a byte at position X only affects the chunk containing position X. All other chunk boundaries remain stable because they depend on local content, not global position.
Result: editing one paragraph in a 100 MB document changes 1-2 chunks out of ~25. You upload ~8 MB instead of 100 MB. That is a 92% bandwidth saving for a typical edit.
Deep Dives

Deep Dive 1: Content-Addressable Storage and Deduplication
Every chunk is stored by its SHA-256 hash. This creates automatic deduplication as a free side effect.
Same file, different users: If 1,000 users all upload the same company-wide PDF, only one copy of each chunk is stored. All 1,000 users' metadata points to the same physical chunks. At large scale, cross-user deduplication typically saves 20-30% of raw storage.
Same file, different versions: When a user edits a file, most chunks remain identical to the previous version. Only changed chunks are new. Version history costs very little additional storage.
The dedup check flow:
Client: "I want to upload chunk with hash abc123"
Server: "I already have abc123. Skip."
Client: "I want to upload chunk with hash def456"
Server: "That's new. Upload it."
This is why the upload conversation starts with hashes, not data. The client never uploads a chunk the server already has, regardless of which user or file originally created it.
The garbage collection problem: When a file is deleted or updated, old chunks may become orphaned (no file references them). You need either reference counting (increment on file create/update, decrement on delete) or periodic mark-and-sweep garbage collection to reclaim storage. Reference counting is simpler but breaks if a counter update is lost. Mark-and-sweep is more reliable but requires scanning all file metadata periodically.
Dropbox uses reference counting with periodic reconciliation sweeps as a safety net.
Deep Dive 2: Conflict Resolution
What happens when two users edit the same file on different devices before either sync completes?
Last-Writer-Wins (LWW): The most recent write overwrites the other. Simple to implement. Causes silent data loss. Unacceptable for a file sync service that people trust with their work.
Dropbox's actual approach: conflict copies. The server maintains a version number for each file. When a client uploads changes, it includes the parent version it based its edits on:
Client A: "Here's a new version of report.docx, based on version 5"
Server: "Current version is 5. Accepted. New version is 6."
Client B: "Here's a new version of report.docx, based on version 5"
Server: "Current version is 6 now, not 5. Conflict detected."
On conflict, the server keeps Client A's version as the canonical file and creates a conflict copy: report (Alice's conflicted copy 2026-04-18).docx. Both versions are preserved. The users resolve it manually.
Vector clocks for conflict detection: For more sophisticated conflict detection across multiple devices, vector clocks track the causal ordering of edits. Each device maintains a vector of logical timestamps. If neither device's vector dominates the other (some entries are greater in each), a true conflict exists. If one vector dominates, it is a simple fast-forward update. Amazon Dynamo and Riak use this approach.
I'd use version numbers (Dropbox's approach) in an interview -- it is simpler and maps directly to the metadata service's cursor system. Mention vector clocks to show you know the theoretical foundation, but do not over-design it.
Deep Dive 3: Sync Protocol and Notifications
The naive approach -- polling every 30 seconds -- creates up to 30 seconds of sync delay and wastes resources on empty polls. The real design uses a push-based notification system with cursor-based catchup.
Server-side journal: The metadata service maintains an append-only journal for each namespace (user's file space or shared folder). Each entry records: operation type (create/modify/delete), file path, new blocklist hash, timestamp, and a monotonically increasing cursor (sequence number).
Notification flow:
- File change is committed to the metadata database.
- Metadata service writes a journal entry with cursor N.
- Notification server pushes a lightweight message to all subscribed clients:
{ namespace_id, new_cursor: N }. - Each client calls the metadata service: "Give me journal entries since cursor M" (M being the client's last processed cursor).
- Client processes each journal entry, downloads new chunks if needed, and updates its local cursor to N.
Why cursors instead of timestamps? Timestamps are unreliable across distributed systems (clock skew). Cursors are monotonically increasing integers -- they provide a total ordering of events within a namespace. A client at cursor 100 knows it has seen everything up to and including event 100, regardless of wall-clock time.
WebSocket + long-poll hybrid: Dropbox uses WebSockets as the primary notification channel, with periodic polling (every few minutes) as a fallback. On reconnect after a disconnect, the client uses cursor-based catchup to fetch everything it missed. The cursor makes this idempotent and crash-safe.
Notification fan-out at scale: A change to a file in a shared folder with 100 members requires notifying 100 clients, potentially on 100 different WebSocket connection servers. The notification service needs a routing layer (backed by Redis) that maps user IDs to connection server instances.
Alternative Designs
Alternative 1: Event Sourcing with Kafka
Instead of a custom journal, use Kafka as the event log. Each namespace gets a Kafka partition. File changes are published as events. Clients consume events from their last committed offset (equivalent to cursor).
Pros: mature ecosystem, built-in retention and replay, handles fan-out naturally with consumer groups.
Cons: operational complexity of running Kafka, higher latency than direct WebSocket push (~100ms vs ~10ms), partition count limits if you have millions of namespaces.
Alternative 2: Operational Transform / CRDT for Real-Time Collaboration
If the interviewer pushes toward Google Docs-style real-time co-editing (not just file sync), you'd need to move from file-level sync to operation-level sync using Operational Transform or CRDTs. This is a fundamentally different problem -- Dropbox syncs entire files (or chunks of files), while Google Docs syncs individual character insertions and deletions.
Mention this distinction to show you understand the boundary. Then stay on the file-sync side unless the interviewer explicitly asks you to cross it.
| Aspect | Cursor-Based Journal | Kafka Event Log | OT/CRDT |
|---|---|---|---|
| Sync granularity | File/chunk level | File/chunk level | Character/operation level |
| Latency | ~50ms (WebSocket push) | ~100ms (consumer poll) | ~20ms (real-time collab) |
| Conflict handling | Conflict copies | Conflict copies | Automatic merge |
| Implementation effort | Medium | Medium | Very high |
| Best for | File sync (Dropbox) | File sync at extreme scale | Real-time editors (Docs) |
Scaling Math Verification
Upload bandwidth (20 TB/day after delta sync):
- 20 TB / 86,400 sec = ~231 MB/sec = ~1.85 Gbps sustained
- A cluster of 10 block servers with 1 Gbps each handles this with headroom.
- S3 handles arbitrary write throughput. This is not the bottleneck.
Metadata operations (10M file changes/day):
- 10M / 86,400 = ~116 writes/sec to the metadata database
- Plus reads for sync checks: ~10x = 1,160 reads/sec
- A sharded MySQL cluster handles this easily. Dropbox uses sharded MySQL for metadata.
WebSocket connections (100M DAU, ~3 devices each = 300M potential connections):
- Not all online simultaneously. Assume 10% concurrent = 30M connections.
- Each WebSocket server handles ~50,000 concurrent connections.
- 30M / 50,000 = 600 WebSocket servers.
- This is a significant cluster but well within the realm of standard infrastructure.
Dedup savings:
- 50 TB raw upload per day, 60% bandwidth saved through delta sync + dedup = 20 TB actual storage/day
- Annual new storage: 20 TB * 365 = 7.3 PB/year
- With erasure coding (1.5x overhead vs 3x for replication): 11 PB/year of raw disk
Failure Analysis
What breaks at 10x (1B users, 1B DAU)?
| Component | Current | At 10x | Breaks? | Fix |
|---|---|---|---|---|
| S3 / blob storage | ~800 PB | ~8 EB | Cost | Build custom storage (Magic Pocket) |
| Metadata DB | 50 TB, sharded | 500 TB | Maybe | More shards, DynamoDB for some tables |
| WebSocket servers | 600 servers | 6,000 servers | Yes | Regional clusters, smarter routing |
| Block server cluster | 10 servers | 100 servers | No | Linear scaling |
| Notification fan-out | 500/sec | 5,000/sec | No | Kafka for fan-out |
| Dedup effectiveness | 20-30% savings | Higher (more overlap) | No | Gets better at scale |
The first thing to break at 10x is the cost of S3. This is exactly what happened at Dropbox -- they built Magic Pocket because S3 was costing them tens of millions per year. The second thing is the WebSocket infrastructure: 6,000 servers for persistent connections requires careful regional deployment and connection draining during rollouts.
What if the metadata database goes down?
Users cannot sync, but they can continue working on local files. This is a key UX property of file sync: the local filesystem is the source of truth for the user. When the metadata service recovers, clients catch up using cursor-based sync. No data is lost because the client's local state is complete.
What if S3 goes down?
Existing synced files are still available on local devices. New uploads queue locally and retry. Downloads of files not yet synced to the local device fail. Cross-region replication (or Magic Pocket's multi-DC setup) provides durability even if an entire region goes down.
What's Expected at Each Level
| Aspect | Mid-Level | Senior | Staff+ |
|---|---|---|---|
| File upload | Upload whole file to server, store in S3 | Chunking for resumability, presigned URLs | Content-defined chunking with Rabin fingerprint, delta sync |
| Metadata vs content | Single database for everything | Separate metadata DB and blob store | Explain why: different consistency, scaling, failure domains |
| Sync mechanism | Polling | Long polling or WebSocket push | Cursor-based journal, WebSocket + polling hybrid, fan-out |
| Deduplication | Not mentioned | Mention content-addressable storage | Block-level dedup, cross-user dedup, garbage collection |
| Conflict resolution | "Last write wins" | Detect conflicts, create conflict copies | Version vectors, parent-version tracking, conflict copy UX |
| Durability | "Store in S3" | Replication, backups | Erasure coding, bit-rot scrubbing, cross-DC replication |
| Security | "Use HTTPS" | Encryption at rest, presigned URL expiry | Zero-knowledge trade-offs (breaks dedup, sharing, search) |
| Scale awareness | "Shard the database" | Identify what breaks first at 10x | Build-vs-buy storage (Dropbox left S3), operational cost |
The key signal interviewers look for: does the candidate understand that the hard problem is synchronization (chunking + delta sync + conflict resolution + notifications), not CRUD operations on files?
References from Our Courses
- Delivery Guarantees — at-least-once semantics for chunk upload confirmations
- Distributed File Systems — erasure coding and cross-DC replication for durability
- Kafka Partitions and Ordering — ordered change notifications to sync clients
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.