Design a Channel-Based Chat Platform
TL;DR
You are designing Slack or Discord. Not WhatsApp -- this is the opposite problem. WhatsApp is 1:1 encrypted messaging where the server is a blind relay. Slack/Discord is channel-based messaging where the server stores everything forever, indexes it for full-text search, and must fan out messages to channels that can have 50,000+ members.
The defining technical challenge is the large-channel fan-out problem: a message posted to a 50K-member channel should NOT generate 50K WebSocket pushes because 49K of those members are not currently looking at that channel. The system must efficiently answer "who cares about this message right now?" not "who is a member of this channel?"
Two real-world engineering stories will anchor your interview: Slack's Flannel edge cache (44x payload reduction for large workspaces) and Discord's Cassandra-to-ScyllaDB migration (p99 latency from 40-125ms down to 15ms). Both reveal non-obvious problems that only surface at scale.
The System
A channel-based chat platform organizes conversations into persistent, named channels within a workspace (Slack) or server (Discord). Messages are persistent, searchable, and visible to any channel member at any time. The system supports:
- Text channels with threads, reactions, mentions, and file attachments.
- Presence indicators (online, away, DND, offline).
- Typing indicators (ephemeral, never persisted).
- Full-text search across all messages a user has access to.
- Bots and integrations that post messages programmatically.
How this differs from WhatsApp (Lesson 1):
| Aspect | Slack/Discord | |
|---|---|---|
| Message storage | Temporary (30-day TTL, E2E encrypted) | Permanent, server-side, searchable |
| Fan-out | Max 1,024 recipients (group cap) | Potentially 50K+ channel members |
| Search | Client-side only (E2E encryption) | Server-side full-text (Elasticsearch) |
| Message routing | Point-to-point delivery | Pub/sub broadcast |
| History | Device-local, new device = no history | Server-side, any client sees full history |
Scale reference (Discord, 2023): 19 million peak concurrent users, 150 million MAU, trillions of messages stored, 72 ScyllaDB nodes.
Requirements
Functional Requirements
- Send messages to channels -- text, rich formatting, @mentions, reactions.
- Receive messages in real-time -- push via WebSocket to all online members viewing the channel.
- Message history and search -- scroll back through history, full-text search with filters.
- Presence -- online/away/DND/offline indicators.
- Typing indicators -- ephemeral "User X is typing..." for active channels.
- Threads -- reply chains attached to parent messages with separate notification model.
- File sharing -- upload files, generate previews, serve via presigned URLs.
Non-Functional Requirements
- Latency (message delivery): < 500ms worldwide for real-time delivery.
- Latency (message history): < 100ms p99 for fetching recent messages.
- Availability: 99.99% (Slack/Discord are business-critical for many teams).
- Durability: Zero message loss. Messages are permanent records.
- Scale: Millions of concurrent connections, hundreds of millions of channels, trillions of messages stored.
Back-of-Envelope Math
| Metric | Value | Derivation |
|---|---|---|
| Peak concurrent users | 20M | Discord-scale |
| Avg messages per user per day | ~50 | Active user estimate |
| Messages per day | ~1B | 20M x 50 |
| Messages per second (peak) | ~50K | 1B / 86,400 x 4 (peak multiplier) |
| Avg message size | ~500 bytes | Rich text + metadata + formatting |
| Storage per day | ~500 GB | 1B x 500 bytes |
| Storage per year | ~180 TB | Messages are permanent |
| WebSocket connections per server | ~50K-100K | Standard for modern servers |
| Gateway Servers needed | 200-400 | 20M / 50K-100K |
Naive Design
Client sends message via WebSocket
-> Single monolithic server
-> Write to PostgreSQL
-> Iterate through channel member list
-> Push to every member's WebSocket connection
-> Index in Elasticsearch
Everything happens in one process. Message persistence, fan-out, search indexing, and connection management are all the same server.
Where It Breaks
Channel member list iteration is O(N) per message. A channel with 50K members means iterating 50K entries and writing to 50K WebSocket buffers for every single message. At 100 messages/sec in that channel, that is 5M WebSocket writes per second just for one channel.
Most members are not watching. A 50K-member channel might have 500 people actively viewing it. Pushing to 49,500 people who will not see the message until they open the channel is pure waste. Those 49,500 people need a badge count increment, not the full message.
Bootstrap payload explosion. When a user opens the app, the client needs: user profile, workspace metadata, channel list, membership rosters, unread counts, recent messages for visible channels, and presence data for visible users. For a 32,000-user workspace, this payload was enormous. Slack measured the problem: the initial bootstrap payload for a 32K-user workspace was 44x larger than necessary.
Single-server partition. One server cannot hold connections for all members of a large channel. With load balancing, members are spread across many gateway servers. A message in a 50K-member channel must reach gateway servers across the entire fleet. How?
Real Design

Architecture: Slack's Four-Service Model
Slack's real-time messaging architecture uses four stateful, in-memory Java services. This is documented in their engineering blog and is worth knowing because it is a clean separation of concerns.
Clients (Web / Mobile / Desktop)
|
| WebSocket (persistent, per-region)
v
Gateway Servers (GS) -- per region, hold WebSocket connections
| Store each user's channel subscriptions in memory
|
| Subscribe to channels via consistent hash
v
Channel Servers (CS) -- consistent hash on channel_id
| Message routing backbone
| ~16M channels per host at peak
|
| (messages also go to persistence layer)
v
Database (Messages) + Search Index (Elasticsearch) + Object Storage (S3)
Separately:
Presence Servers (PS) -- in-memory, hash users to instances
Admin Servers (AS) -- stateless intermediary between webapp and CS
Message Delivery Flow
1. Client sends message via WebSocket to Gateway Server (or HTTP POST to API)
2. Gateway Server routes to Admin Server
3. Admin Server routes to the Channel Server owning this channel (consistent hash on channel_id)
4. Channel Server broadcasts to ALL Gateway Servers subscribed to this channel
5. Each Gateway Server pushes via WebSocket to connected clients viewing that channel
6. Asynchronously: message is persisted to DB, indexed in Elasticsearch, URL embeds unfurled
Messages delivered globally in ~500ms. This is Slack's documented benchmark.
No external message broker in the real-time path. This is the surprising fact. Slack does NOT use Kafka or RabbitMQ between Channel Servers and Gateway Servers. The Channel Server pushes directly to all subscribed Gateway Servers. Direct communication is faster than routing through a broker. The broker would add latency and complexity for zero benefit in this path.
A common candidate mistake is inserting Kafka between every component. Kafka is excellent for async processing (search indexing, analytics, push notifications). It is the wrong tool for the real-time delivery path where sub-second latency matters.
Channel Servers: The Routing Core
Channel Servers are the most important component. Each CS is responsible for a subset of channels determined by consistent hashing managed by a coordination layer (Slack calls theirs CHARM -- Consistent Hash Ring Managers).
How it works:
- Each channel has a deterministic owner CS (hash of
channel_iddetermines the ring position). - When a Gateway Server connects a user, it subscribes to the CS instances that own the user's channels.
- When a message arrives for a channel, the owning CS broadcasts to all Gateway Servers that have subscribed.
- CS failover: if a CS dies, a new instance takes its hash range. Users experience ~20 seconds of elevated latency while the replacement warms up.
Why consistent hashing, not "every server knows every channel": At peak, a single CS handles ~16 million channels. If every server had to know about every channel, the memory and coordination cost would be prohibitive. Consistent hashing partitions the responsibility.
The "channel" abstraction is broader than just Slack channels. It includes user channels (personal update streams), team channels, enterprise channels, file channels, and huddle channels. Everything that needs real-time updates is a "channel" internally.
Gateway Servers: Connection Management
- Regional deployment: Clients connect to the nearest region. This keeps WebSocket connections local and reduces cross-region traffic.
- Envoy sits in front of GS instances, forwarding WebSocket upgrade requests.
- On connection, GS fetches the user's channel list and subscribes to the appropriate Channel Servers.
- Connection draining: Before shutdown, GS seamlessly switches users to another GS in the region. Rolling deployments increase fleet capacity before starting the roll.
The Large-Channel Fan-Out Problem
This is the question that separates this design from WhatsApp. A 50K-member channel gets a message. What happens?
Naive: Push the full message to all 50K members' WebSocket connections. This generates 50K pushes per message.
Smart (what Slack and Discord actually do):
- Only push to active viewers. The Gateway Server knows which channels each connected user has open (the client tells the server which channel is visible). Only users viewing the channel get the full message push.
- Everyone else gets a badge update. A lightweight event: "channel X has N unread messages." This is far cheaper than sending the full message payload.
- On channel switch, fetch history. When a user clicks on a channel with unread messages, the client fetches recent messages via HTTP from the history endpoint, not via WebSocket.
This transforms the fan-out from O(channel_members) to O(active_viewers_of_channel), which is typically 1-5% of the membership. For a 50K-member channel with 500 active viewers, you send 500 full message pushes and 49,500 tiny badge increments.
Flannel: Slack's Edge Cache
Flannel is an application-level caching layer that sits between clients and origin servers. It solves the bootstrap payload problem.
The problem it solves:
Slack's original rtm.start API call returned the complete workspace state on connection. For a 32,000-user workspace, this was enormous: every user profile, every channel, every membership roster. Loading screens were long. Reconnections were almost as expensive as the first connection.
What Flannel does:
- Client connects to the nearest Flannel instance (regional edge PoP, assigned via consistent hashing on region + team).
- Flannel maintains a WebSocket to the main Slack servers in the primary AWS region.
- Flannel consumes the real-time event stream and maintains cached state for the workspace.
- Flannel returns a "working set" bootstrap: minimum data for the first screen render (current user identity, team metadata, channel list without full membership rosters, recent DM headers).
- Client lazy-loads additional data on demand (full user directory, channel member lists, etc.).
Scale impact:
- 1,500-user workspace: 7x smaller initial payload.
- 32,000-user workspace: 44x smaller initial payload.
- 4 million simultaneous connections at peak.
- 600,000 client queries per second handled at the edge.
Fan-out at the edge: Instead of the origin sending one copy of a message per connected client, the origin sends a single copy to each regional Flannel instance. Flannel fans out to local clients. This dramatically reduces cross-region bandwidth.
Intelligent prefetching: Flannel proactively pushes data the client will need. When broadcasting a message that mentions @user, Flannel checks whether the receiving clients have that user's profile cached. If not, it pushes the user profile data just before the message, saving a round-trip query.
Presence System
Presence (the green dot) seems simple but generates enormous traffic if done naively.
Naive: Every user broadcasts their status to all contacts. In a 10K-user workspace where everyone is a "contact," a single status change generates 10K updates. If 1,000 users change status per minute, that is 10M presence updates per minute for one workspace.
Slack's approach:
- Dedicated Presence Servers, in-memory, users hashed to instances.
- Key optimization: A client only receives presence updates for users currently visible on their screen. The client tells the server "I am showing users A, B, C, D on my screen." The server only pushes presence changes for those users.
- This transforms O(N^2) presence broadcast (every user x every contact) into O(visible_users) per client. A user typically sees 10-20 users on screen. The reduction is massive.
- Presence is ephemeral -- never persisted to the database. Typing indicators use the same transient event path.
Deep Dives

Deep Dive 1: Message Storage -- Discord's Cassandra to ScyllaDB Migration
This migration story is one of the best real-world examples of how a database choice that works at small scale can become catastrophic at large scale. It is worth knowing in detail.
Phase 1: MongoDB (early Discord)
Discord started with MongoDB, a single replica set. This worked until ~100 million messages. Then:
- Data exceeded RAM, and MongoDB's performance degraded sharply.
- Random disk I/O for reads became the bottleneck.
- No horizontal write scaling.
Phase 2: Cassandra
Discord migrated to Cassandra for its linear horizontal scalability, write-optimized LSM-tree storage, and tunable consistency.
Data model:
CREATE TABLE messages (
channel_id bigint,
bucket int, -- time bucket to bound partition size
message_id bigint, -- Snowflake ID (time-sortable)
author_id bigint,
content text,
PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
The bucket is critical. Without it, a single active channel's messages form one unbounded partition. Cassandra partitions should stay under ~100MB. The bucket (10-day window) ensures bounded partition size. The most common query ("fetch latest N messages") only hits the most recent bucket.
The tombstone problem:
Cassandra does not immediately delete data. It writes a tombstone -- a deletion marker. Tombstones persist until compaction runs (default: 10 days via gc_grace_seconds). Every read must scan past all tombstones in the partition to find live data.
In a channel where messages are frequently posted and deleted (moderation, bots, spam removal), tombstones accumulate. A channel with 1 live message and 100,000 tombstones requires scanning all 100,000 tombstones to return that 1 message.
Read latency spiked from milliseconds to seconds. Message edits made it worse: internally, an edit is a delete + insert, creating an additional tombstone.
GC pauses: Cassandra runs on the JVM. Garbage collection pauses caused periodic latency spikes of several seconds. For a real-time messaging app, a 3-second freeze is unacceptable -- users think the app is broken.
Phase 3: ScyllaDB
ScyllaDB is a C++ rewrite of Cassandra that is API-compatible (same CQL queries, same data model) but architecturally different:
- No JVM, no GC pauses. Written in C++ using the Seastar framework with a shard-per-core architecture. Each CPU core runs independently with its own memory.
- Better compaction. More efficient tombstone cleanup.
- Same data model. Discord could migrate without rewriting their data access layer.
Migration results:
| Metric | Cassandra | ScyllaDB |
|---|---|---|
| Node count | 177 | 72 (59% reduction) |
| p99 read latency | 40-125ms | 15ms |
| p99 write latency | 5-70ms | 5ms (stable) |
| GC pause spikes | Seconds-long | Eliminated |
The Data Services layer: Discord added a Rust-based intermediate service between their API and ScyllaDB:
- Request coalescing: If hundreds of users simultaneously open the same channel, the data service coalesces these into a single database read and fans out the result. Critical for the thundering herd when a popular streamer mentions a channel.
- Consistent routing: Requests for the same channel go to the same data service instance, maximizing coalescing effectiveness.
- In-process caching: Hot channel data is cached, reducing database load further.
Deep Dive 2: Snowflake IDs
Both Slack and Discord use Snowflake-style IDs for messages. Understanding why reveals a lot about distributed system ID generation.
Structure (Discord's variant):
- 42 bits of timestamp: Milliseconds since Discord's epoch (January 1, 2015). Gives ~139 years of IDs.
- 5 + 5 bits of worker/process: Identifies the generating machine. 1,024 unique workers.
- 12 bits of sequence: 4,096 IDs per millisecond per worker.
Why Snowflake IDs are perfect for this problem:
- Time-sortable. Messages with higher IDs are newer. This means the Cassandra/ScyllaDB clustering key naturally sorts messages chronologically. No separate timestamp column needed for ordering.
- Globally unique without coordination. Each worker generates IDs independently. No central counter. No consensus protocol. No bottleneck.
- Compact. 64-bit integer. Fits in a database index efficiently. Compare to UUIDs (128 bits) or ULIDs (128 bits, but string-encoded to 26 chars).
- Cursor pagination. "Give me messages with ID < this_id" is a simple range query. No offset pagination (which is unstable in a fast-moving channel where new messages shift offsets).
Alternative -- ULID: Lexicographically sortable, 128-bit, encodes timestamp + randomness. Slightly more portable across systems that do not handle 64-bit integers well (JavaScript, some JSON parsers). Either works.
Deep Dive 3: Threads and Notification Routing
Threads add a sub-channel routing dimension that candidates often overlook.
The model:
- A thread is a reply chain attached to a parent message.
- Users who have replied in a thread are "following" it.
- Thread messages may or may not appear in the main channel view (configurable: "Also send to #channel").
Fan-out implications:
A message sent to a thread must be routed to:
- All users following the thread (who replied or explicitly followed).
- Users viewing the channel with "Also send to #channel" enabled.
- The thread view (separate scrollable pane).
This creates a two-level fan-out: thread followers AND channel viewers. The Channel Server must maintain subscription lists for both levels.
Notification complexity:
- @mention in a thread: notify the mentioned user (push notification + badge).
- Reply in a thread you are following: notify you (badge, possibly push).
- Reply in a thread you are NOT following: no notification.
- "Also send to #channel": all channel notification rules apply (muted? notification level?).
This matrix of notification rules is one reason Slack's engineering is more complex than it appears from the outside.
Alternative Designs
XMPP-Based Architecture
The traditional approach. WhatsApp started here (ejabberd). The protocol handles persistent connections, presence, and multi-user chat natively.
Why modern platforms abandoned XMPP:
- XML verbosity. A "hello" message wraps in ~200 bytes of XML tags. At millions of messages per second, this bandwidth waste is significant.
- Federation complexity. XMPP is designed for decentralized messaging (like email). Slack and Discord are explicitly centralized -- they want to control the full stack. Federation introduces latency, compatibility issues, and makes features like reactions and threads hard to implement consistently.
- Presence scalability. XMPP's presence model broadcasts to all contacts. Does not scale to 100K-user workspaces.
- Feature velocity. Adding threads, reactions, embeds, slash commands, code blocks, and @here mentions to a standardized protocol is slow. Proprietary protocols allow shipping features without standardization.
XMPP is useful as a reference point in interviews to show you understand the history. Then explain why a custom protocol over WebSocket is preferred.
Kafka Between Channel Servers and Gateway Servers
Some candidates insert Kafka as a message bus between CS and GS. This adds latency and is unnecessary.
Slack explicitly does NOT use an external message broker in the real-time path. The CS-to-GS communication is direct pub/sub. Gateway Servers subscribe to Channel Servers for the channels their users care about. Channel Servers push directly.
Where Kafka fits in this system:
- Async search indexing (message -> Kafka -> Elasticsearch indexer).
- Push notification pipeline (message -> Kafka -> notification service -> APNs/FCM).
- Analytics and audit logging.
- Cross-region replication.
Keep Kafka out of the hot path.
MySQL Instead of Cassandra/ScyllaDB
Slack actually uses MySQL (with Vitess for sharding) for message storage. Discord uses ScyllaDB. Both work.
MySQL tradeoffs: - Easier to reason about (relational model, ACID transactions). - Requires sharding infrastructure (Vitess, ProxySQL) at scale. - No tombstone problem. - Slower writes than Cassandra/ScyllaDB for append-heavy workloads.
ScyllaDB tradeoffs: - Write-optimized (LSM-tree). - Linear horizontal scaling. - Tombstone awareness required. - No JOINs, no transactions.
For an interview, either is valid. Pick one and defend it. If you pick Cassandra, you must mention the tombstone problem and how to mitigate it (ScyllaDB migration, or compaction strategy tuning).
Scaling Math
Connection Tier
| Metric | Value | Derivation |
|---|---|---|
| Peak concurrent users | 20M | Discord-scale |
| Connections per Gateway Server | 50K | Conservative |
| Gateway Servers needed | 400 | 20M / 50K |
| Regions | 5 | US-East, US-West, EU, Asia, Oceania |
| GS per region | 80 | 400 / 5 |
Channel Server Tier
| Metric | Value | Derivation |
|---|---|---|
| Total channels | ~500M | Across all servers/workspaces |
| Channels per CS | ~16M | Slack's documented number |
| Channel Servers needed | ~32 | 500M / 16M |
32 Channel Servers for the entire routing layer. This seems shockingly small, but the consistent hashing approach means each CS only handles its partition of channels. Most channels are low-traffic. The hot channels (large servers with active members) generate more fan-out work but are spread across the ring.
Message Storage
| Metric | Value | Derivation |
|---|---|---|
| Messages per day | 1B | -- |
| Message size (avg) | 500 bytes | Rich text + metadata |
| Storage per day | 500 GB | -- |
| Storage per year | 180 TB | -- |
| ScyllaDB nodes (Discord-scale) | 72 | Discord's actual number |
Search
| Metric | Value | Derivation |
|---|---|---|
| Searchable messages | Trillions | Discord's total |
| Elasticsearch cluster size | 100-500 nodes | Estimate for this index size |
| Index latency | < 5 seconds | Near-real-time indexing via Kafka |
Failure Analysis
Channel Server Dies
Impact: Channels owned by this CS lose real-time delivery for ~20 seconds.
Recovery: 1. CHARM (consistent hash ring manager) detects the failure. 2. A replacement CS instance takes the dead server's hash range. 3. The new CS warms up (loads active channel state from Gateway Server subscriptions). 4. Gateway Servers resubscribe to the replacement CS. 5. During the 20-second gap, messages are still persisted to the database. Users see them when they refresh or when the CS recovers. No messages are lost.
Gateway Server Dies
Impact: ~50K users lose their WebSocket connections.
Recovery: 1. Clients detect disconnect (WebSocket close event or missed heartbeat). 2. Clients reconnect to another GS in the same region (load balancer routes them). 3. New GS fetches user's channel list and subscribes to appropriate Channel Servers. 4. Client loads the current channel state from the history API. 5. Flannel (if deployed) accelerates the bootstrap by serving cached workspace data from the edge.
ScyllaDB Node Failure
Impact: Partitions owned by the failed node have reduced redundancy. Reads from replicas may return stale data.
Recovery: 1. ScyllaDB's replication factor (typically 3) means 2 other replicas have the data. 2. Reads continue from surviving replicas (at consistency level ONE or QUORUM depending on configuration). 3. A replacement node streams data from surviving replicas. With ScyllaDB's efficient streaming, this is faster than Cassandra. 4. No data loss assuming RF=3 and only one node failed.
Thundering Herd: Popular Channel Opens Simultaneously
Scenario: A popular streamer says "check out #announcements" to 100K viewers. 100K users open the channel simultaneously, each requesting the latest messages.
Without mitigation: 100K database queries for the same partition. Database melts.
Discord's solution -- request coalescing in the Data Services layer:
- First request arrives at the data service. Triggers a database read.
- Next 99,999 requests arrive while the first read is in flight. Data service detects they want the same data and queues them.
- Database returns the result. Data service fans the result out to all 100K waiting requests.
- One database query serves 100K clients.
This is the single most important optimization for handling viral channel opens. Without it, the database is the bottleneck. With it, the bottleneck shifts to network bandwidth (serving 100K responses), which is much easier to handle.
Level Expectations
| Level | What You Must Cover | What Sets You Apart |
|---|---|---|
| Mid-Level | WebSocket for real-time delivery, message DB with channel_id partitioning, separate read/write paths, basic presence system, file sharing via S3 with presigned URLs | Clean architecture diagram separating Gateway Servers from message routing, mentions Snowflake IDs for time-sorted message ordering |
| Senior | Consistent hashing for Channel Servers, large-channel fan-out optimization (push only to active viewers, badge updates for others), Cassandra/ScyllaDB data model with time buckets, Elasticsearch for full-text search, presence optimization (only push for visible users) | Explains the tombstone problem and why it motivated Discord's migration to ScyllaDB, discusses thread fan-out as a two-level routing problem, no external broker in the real-time path |
| Staff+ | Flannel edge cache with specific numbers (44x payload reduction), request coalescing for thundering herd, CHARM-based consistent hash ring management, CS failover in ~20 seconds, Snowflake ID bit layout | Knows Slack uses 4 Java services (GS/CS/AS/PS) and can explain each one's role, discusses typing indicators as transient events that follow the same CS routing but are never persisted, mentions that "channel" is an abstraction covering user streams and file streams, not just chat rooms |
References from Our Courses
- Partition and Clustering Keys — Cassandra/ScyllaDB model for time-bucketed message storage
- Redis Data Structures and Use Cases — presence tracking and edge caching for active users
- Inverted Index Internals — full-text message search across channels
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.