Design a Live Stream Comment System
TL;DR
You are designing the comment stream that runs alongside a live video on a platform like Facebook Live, YouTube Live, or Twitch. This is NOT a chat system. It is a one-to-many broadcast where one person posting a comment generates a fan-out to potentially millions of viewers. The defining design challenge is that the fan-out strategy must change dynamically as a stream goes viral mid-broadcast.
The key insight that separates senior from staff answers: at mega-stream scale (100M+ viewers), nobody can read individual comments -- the screen refreshes faster than the human eye can process. The system's job shifts from "deliver every comment to every viewer" to "create the vibe of collective participation." That realization unlocks the CDN-based pull model that makes the math actually work.
The System
A live stream comment system has three distinct components:
- Comment ingestion -- viewers post comments via HTTP.
- Real-time distribution -- new comments are pushed to all viewers watching the stream.
- Historical retrieval -- viewers who join mid-stream see recent comments.
The asymmetry is extreme. A popular stream might have 500K viewers but only 1,000 comments per second. The read fan-out (broadcasting comments to viewers) dwarfs the write volume (accepting new comments) by a factor of 500x or more.
Real-world reference: Meta's 2011 engineering blog reported serving 100 million pieces of content per minute that could receive comments, with 650,000 comments submitted per minute and 16 million new viewer-to-content associations per second.
Requirements
Functional Requirements
- Post a comment -- authenticated viewers submit comments on a live video.
- Receive comments in real-time -- all viewers see new comments as they appear.
- View recent comments on join -- mid-stream joiners see the last 50-100 comments.
- Reconnect without losing comments -- viewers who briefly disconnect catch up on missed comments.
Non-Functional Requirements
- Latency (posting): < 200ms from submit to display for the commenter.
- Latency (broadcast): < 2 seconds from post to display for other viewers (for streams < 10K viewers; relaxed for mega-streams).
- Availability: 99.9% (live events are time-sensitive but brief).
- Durability: Comments should be persisted, but losing a comment during a brief outage is acceptable (this is not banking).
- Scale: Support 1M concurrent live videos, from 10-viewer streams to 100M-viewer mega-events.
Back-of-Envelope Math
Average stream (100 viewers, 10 comments/sec):
| Metric | Value |
|---|---|
| Fan-out per comment | 100 messages |
| Total messages/sec | 1,000 |
Trivial. A single server handles this without breaking a sweat.
Popular stream (500K viewers, 1,000 comments/sec):
| Metric | Value |
|---|---|
| Fan-out per comment | 500,000 messages |
| Total messages/sec | 500,000,000 (500M) |
500 million messages per second for a single video. This is where naive designs explode.
Mega-stream -- World Cup Final (100M viewers, 5,000 comments/sec):
| Metric | Value |
|---|---|
| Fan-out per comment | 100,000,000 messages |
| Total messages/sec | 500,000,000,000 (500B) |
500 billion messages per second. Obviously impossible with push. This is why the architecture must fundamentally change at scale.
The War and Peace test: At 5,000 comments/sec with an average of 4 words per comment, the system is generating the full text of War and Peace every 30 seconds. If you show the last 20 messages on screen, each message is visible for ~4 milliseconds. Nobody can read that. Sampling is the only sane approach.
Naive Design
Client posts comment
-> API Server writes to Comments DB
-> API Server loops through all connected viewers
-> API Server sends comment to each viewer's connection
Viewer opens stream
-> Opens WebSocket to API Server
-> API Server adds viewer to in-memory list for this video
-> On new comment: iterate through list, send to each
One server handles everything: the HTTP endpoint for posting, the WebSocket connections for viewers, the fan-out logic, and the database writes.
Where It Breaks
Stateful and stateless workloads on the same server. Posting a comment is stateless (HTTP POST, write to DB, done). Holding viewer connections is stateful (WebSocket, in-memory subscriber list, long-lived). These have completely different scaling characteristics and failure modes. When the comment POST handler is slow, it starves the WebSocket event loop. When the server restarts for a deploy, every viewer loses their connection.
Single-server fan-out ceiling. With 500K viewers on one video, even if the server can hold all connections, iterating through 500K subscribers and writing to 500K socket buffers for every single comment takes real time. At 1,000 comments/sec, the server is perpetually behind.
No multi-server coordination. Viewers connect to different servers via load balancing. Server A holds 100K viewers, Server B holds 100K viewers. When a comment arrives at Server A, how does Server B know about it? The naive design has no answer.
Slow consumers poison everything. A viewer on a 3G connection with a full send buffer blocks the socket write. If the fan-out loop is synchronous, one slow consumer stalls delivery to everyone behind them in the loop.
Real Design

Architecture: Separate Stateless and Stateful
This separation is the single most important architectural decision and a common candidate mistake to miss.
Viewer posts comment
|
| HTTP POST
v
Comment Service (stateless, horizontally scalable)
|
+-> Write to Comments DB (Cassandra / DynamoDB)
|
+-> Publish to Pub/Sub (Redis Pub/Sub, channel = video_id)
|
v
Realtime Messaging Servers (RMS) -- stateful, hold SSE connections
|
| Each RMS subscribes to pub/sub channels for videos its viewers are watching
|
| On message from pub/sub: fan out to all local viewers of that video
|
| SSE (Server-Sent Events)
v
Viewer browsers/apps
Why SSE and not WebSocket?
The only client-to-server action is posting a comment, which happens over a regular HTTP POST. The real-time channel is exclusively server-to-client broadcast. SSE is purpose-built for this pattern.
SSE's killer feature: built-in Last-Event-ID reconnection. When a browser loses an SSE connection and automatically reconnects, it sends the Last-Event-ID header containing the ID of the last event it received. The server replays missed events from that point forward. You get reconnection logic for free.
WebSocket would work fine here, but you would be paying the complexity cost of a bidirectional protocol to get unidirectional broadcast. SSE is the right tool.
Practical heuristic: Default to SSE. It handles 80% of real-time use cases. Upgrade to WebSocket only when you need frequent client-to-server real-time communication (gaming, collaborative editing).
The HTTP/1.1 gotcha: SSE connections count toward the browser's 6-concurrent-connections-per-domain limit on HTTP/1.1. A user with multiple tabs open to different live streams burns through this fast. HTTP/2 multiplexing eliminates this limit -- ensure your edge servers support HTTP/2.
Pub/Sub: Why Redis, Not Kafka
This is a question interviewers love to probe on, and the answer is counterintuitive.
| Criterion | Redis Pub/Sub | Kafka |
|---|---|---|
| Latency | Sub-millisecond | 15-30ms |
| Dynamic subscriptions | Instant subscribe/unsubscribe | Expensive consumer group reconfiguration |
| Message persistence | None (fire-and-forget) | Full, configurable retention |
| Topic overhead | A few bytes per channel | ~50 KB per partition |
| Best for | Dynamic, latency-sensitive, loss-tolerant | Durable event streaming, replay, audit |
Why Redis wins here:
-
Viewers constantly switch between live streams. Redis Pub/Sub handles subscribe/unsubscribe in microseconds. Kafka's consumer groups are designed for stable, long-lived consumption -- not for millions of viewers channel-surfing.
-
Creating a channel per video in Redis is nearly free. In Kafka, a topic per video means millions of topics, each with partition overhead. Operationally expensive.
-
Redis Pub/Sub is fire-and-forget -- if a message is published and nobody is subscribed, it vanishes. That is fine here because comments are already persisted to the database. The pub/sub layer is for real-time delivery only. If a viewer misses a comment during a disconnect, the catch-up endpoint (HTTP fetch) handles recovery.
Where Kafka fits: The async processing pipeline (spam detection, analytics, content moderation). Not the real-time broadcast path.
Channel Strategy: Co-locate Viewers
Naive approach: every RMS subscribes to a global channel and receives every comment for every video. Wasteful -- an RMS with 100K viewers watching 50K different videos receives comments for all 50K videos but only needs to deliver to viewers of each specific video.
Better: channel per video. Each RMS subscribes only to Redis channels for videos its viewers are watching.
Best: co-locate viewers via consistent hashing. Use consistent hashing on video_id to assign viewers of the same video to the same RMS. This minimizes the number of distinct pub/sub channels each RMS must subscribe to.
This does NOT mean all viewers of a video land on one server -- consistent hashing distributes across a ring. But it means viewers of the same video cluster on fewer servers than pure round-robin would produce.
Fan-Out Strategy: The Hybrid Approach
This is the staff-level answer. You need three strategies and dynamic switching between them.
Tier 1 -- Small streams (< 10K viewers): Pure push via SSE.
Every comment is pushed to every viewer's SSE connection. The math is easy: 10K viewers x 100 comments/sec = 1M messages/sec. A handful of RMS instances handle this.
Tier 2 -- Medium streams (10K-100K viewers): Comment sampling.
Instead of delivering every comment, sample 10-50% based on comment velocity. Prioritize:
- Comments from verified accounts or followed users.
- Comments getting reactions.
- Comments from the streamer themselves (always delivered).
- Random sampling of the remainder.
The viewer experience is still good. At 500 comments/sec, a 10% sample means 50 comments/sec on screen -- more than anyone can read.
Tier 3 -- Mega-streams (100K+ viewers): CDN-based pull.
This is where the architecture fundamentally inverts:
- Server maintains a ring buffer of the most recent 100-200 comments.
- Every 1 second, the server snapshots this buffer to a JSON file and writes it to the CDN.
- Clients poll the CDN edge every 1-2 seconds.
- Client-side animation smooths the transition between snapshots to simulate a continuous stream.
Why this works: CDNs are built to handle 100M+ requests/second across their edge network. Each edge location caches the snapshot, and 100M viewers hitting CDN edges is routine infrastructure. The server writes one file per second instead of pushing to 100M connections.
The latency tradeoff is acceptable. At mega-stream scale, the video feed itself has 5-15 seconds of latency. Adding 1-2 seconds of comment latency is imperceptible. Users experience a "vibe" of collective participation, not a readable conversation.
Threshold hysteresis: Switch UP to sampling at 50K viewers, switch DOWN at 40K viewers. Switch UP to CDN at 100K viewers, switch DOWN at 80K viewers. The gap prevents flapping between modes when viewership oscillates around the threshold.
"Read Your Own Write" Consistency
When the system switches to CDN-based delivery, there is a 1-2 second delay before a commenter sees their own comment come back through the CDN. This feels broken.
Solution: optimistic local insertion. When the user posts a comment, the client immediately inserts it into the local comment feed with a "pending" state. When the comment arrives through the CDN (or SSE), the client deduplicates by comment_id and upgrades the "pending" state to "confirmed."
This pattern applies to all three tiers. Even with pure SSE push, the round-trip time means the commenter's own comment arrives faster via local insertion than via the server path.
Deep Dives

Deep Dive 1: Meta's "Write Locally, Read Globally"
Meta's live commenting system inverted their traditional data access pattern, and this inversion is worth understanding because it reveals a non-obvious truth about live event systems.
Traditional Facebook: "Read locally, write globally." Most Facebook operations are reads (viewing a News Feed). Reads are served from the local data center. Writes (posting a status) are replicated globally. This works because reads vastly outnumber writes.
Live commenting: "Write locally, read globally." Every page view that displays content which could receive comments generates a write: "User X is now viewing Content Y." With 100 million pieces of live-commentable content and users constantly scrolling, this generates 16 million new viewer-to-content associations per second. Making these writes global would be astronomically expensive.
Instead:
- Viewer associations are written to the local data center only.
- When a comment is posted, the system reads across all data centers to find all viewers of that content.
- The comment is then pushed to those viewers.
This works because commenting is much rarer than viewing. Cross-data-center reads on comment submission (hundreds of thousands per minute) are far cheaper than cross-data-center writes on every page view (millions per second).
The takeaway for interviews: When an interviewer asks about multi-region deployment for live comments, this inversion is the staff-level answer. Most candidates reflexively say "replicate everything globally." The clever move is to ask: which operation is more frequent -- viewing or commenting? Then optimize the frequent operation for local access.
Deep Dive 2: Comment Storage and Hot Partitions
Data model:
Comments Table:
video_id (Partition Key) -- co-locates all comments for a video
comment_id (Sort Key) -- Snowflake ID for chronological sorting
content (String)
author_id (String)
created_at (Timestamp)
Common Candidate Mistake
Putting comment_id as the partition key. This scatters comments across partitions, requiring scatter-gather queries to fetch "latest N comments for this video." The video_id MUST be the partition key.
The hot partition problem: A viral stream with 100M viewers creates a hot partition if all comments share one partition key (video_id). In DynamoDB, each partition is limited to 1,000 write capacity units. A viral stream at 5,000 comments/sec blows through this 5x over.
Mitigation -- write sharding:
Add a random suffix to the partition key for popular streams: video_id#0 through video_id#N spreads writes across N partitions. Reads scatter-gather across all N suffixes and merge. The trade-off: higher read latency for write scalability. Only apply this to streams exceeding a threshold (e.g., 50K concurrent viewers or 500 comments/sec). For the 99% of streams below this threshold, a single partition key handles the load fine -- the sharding adds unnecessary read complexity for low-volume streams.
The scatter-gather read pattern works here because the historical comments endpoint is paginated and cursor-based. Each suffix-partition returns its page independently, and the client merges and sorts. Latency increases by ~2-5ms (parallel reads to N partitions) compared to a single-partition read. Acceptable for a non-latency-critical historical fetch.
Alternative -- Redis as hot store:
Use Redis Sorted Sets as the primary store for live comments. Each video has a sorted set keyed by video:{video_id}:comments, scored by timestamp. Redis handles write bursts trivially (sub-millisecond writes). A background process asynchronously persists comments to DynamoDB/Cassandra for archival.
This two-tier approach (Redis hot / DynamoDB cold) is common in production live event systems.
TTL considerations: Live comments may not need permanent storage. A 7-30 day TTL is reasonable for most content. If the live video is converted to a VOD, you might preserve comments longer. This is a business decision, not a technical one -- but mentioning TTL shows maturity.
Deep Dive 3: Rate Limiting and Spam
This is a gap many candidates miss entirely, but interviewers love to probe it because it tests your understanding of abuse patterns at scale.
Per-user rate limiting:
- Max 5 comments per 10 seconds per user.
- Implemented at the Comment Service layer using a token bucket backed by Redis (
INCRwith TTL). - Return HTTP 429 with
Retry-Afterheader.
Content moderation must be asynchronous:
Comment posted
-> Persisted to DB immediately
-> Broadcast to viewers immediately
-> Asynchronously: content analysis pipeline runs
-> If flagged: retroactively remove from stream
Why async? Content analysis (ML models, keyword matching, context analysis) takes 50-200ms. Adding this to the posting path violates the <200ms real-time requirement. The tradeoff: a spam comment is visible for a few hundred milliseconds before removal. Acceptable for a live stream; unacceptable for a banking app.
Shadow banning:
The shadow-banned user's comments are accepted by the server and appear in their own feed. Other viewers never receive them. The RMS checks a shadow_banned flag before including comments in the broadcast.
Why shadow ban instead of overt ban? An overt ban gives the spammer immediate feedback: "I am banned." They create a new account and try again. Shadow banning gives no feedback -- the spammer thinks their comments are being seen and stops creating new accounts. Detection of the ban takes much longer.
Alternative Designs
Dispatcher Service Instead of Pub/Sub
Instead of Redis Pub/Sub, a Dispatcher Service directly routes comments to the correct RMS instances:
- RMS instances register with the Dispatcher: "I have viewers for video X."
- When a comment arrives, the Comment Service asks the Dispatcher: "Which RMS instances care about video Y?"
- Dispatcher forwards the comment to those specific RMS instances.
LinkedIn uses this pattern for their live video reactions system, documented in their talk "Streaming a Million Likes/Second."
Trade-off vs. Pub/Sub:
- Dispatcher: centralized routing logic, easier to add sophisticated routing rules (priority, sampling), but the mapping must be kept accurate during rapid changes (viral streams gaining/losing viewers).
- Pub/Sub: decentralized, simpler, but routing logic is limited to "everyone subscribed to this channel gets everything."
For an interview, either approach is valid. Pub/Sub is simpler to explain. The Dispatcher is more flexible.
WebSocket Instead of SSE
Perfectly valid. The main things you lose:
Last-Event-IDauto-reconnection (must implement yourself).- HTTP/2 multiplexing compatibility (WebSocket runs on HTTP/1.1 framing).
The main thing you gain:
- Bidirectional communication on a single connection (if you need it for reactions, emoji storms, etc.).
For pure comment broadcast, SSE is the better fit. If the system also handles reactions and typing indicators, WebSocket starts to make more sense.
Long Polling as Fallback
Some corporate networks block SSE and WebSocket. Long polling is the fallback:
- Client sends HTTP request: "Give me new comments since timestamp X."
- Server holds the request open for up to 30 seconds.
- If a new comment arrives, server responds immediately.
- If 30 seconds pass with no comments, server responds with empty body.
- Client immediately sends a new request.
This generates 5x more traffic than SSE but works everywhere. Use it as a fallback behind feature detection, not as the primary path.
Scaling Math
Server Fleet Sizing
| Metric | Value | Derivation |
|---|---|---|
| Total concurrent live videos | 1M | Assumption |
| Average viewers per video | 100 | Most streams are small |
| Total concurrent connections | 100M | 1M x 100 |
| Connections per RMS server | 100K | Conservative (can push to 500K+) |
| RMS servers needed | 1,000 | 100M / 100K |
Comment Service Sizing
| Metric | Value | Derivation |
|---|---|---|
| Total comments/sec across all videos | ~100K | 1M videos x avg 0.1 comments/sec |
| Comment Service instances | 10-20 | Each handles ~10K writes/sec easily |
The write side is 100x smaller than the read side. This asymmetry is the defining characteristic of the system.
CDN Sizing for Mega-Streams
| Metric | Value | Derivation |
|---|---|---|
| Mega-stream viewers | 100M | World Cup Final |
| Snapshot frequency | 1/sec | -- |
| Snapshot size | ~50 KB | 200 comments x 250 bytes each |
| CDN bandwidth | 50 KB x 100M x 1/sec | 5 TB/sec |
5 TB/sec sounds terrifying but this is distributed across thousands of CDN edge locations worldwide. Each edge serves its local viewers from cache. The origin pushes 50 KB/sec to the CDN -- trivial. The CDN handles the fan-out.
Bandwidth Per RMS (Non-CDN Tier)
| Metric | Value | Derivation |
|---|---|---|
| Connections per RMS | 100K | -- |
| Comments/sec for those viewers | ~1,000 (popular stream) | -- |
| Comment payload size | ~250 bytes | JSON with content, author, timestamp |
| Outbound bandwidth per RMS | 100K x 250 bytes x 1 comment delivered | 25 MB/sec |
25 MB/sec outbound per server is comfortable. Network cards are typically 10 Gbps (1.25 GB/sec). Plenty of headroom.
Failure Analysis
RMS Server Dies
Impact: ~100K viewers lose their SSE connections.
Recovery:
1. Clients auto-reconnect (SSE has built-in reconnection with backoff).
2. On reconnect to a new RMS, client sends Last-Event-ID header.
3. New RMS needs access to recent comments -- served from a shared Redis cache of recent comments per video.
4. Replay missed comments from Last-Event-ID forward.
Critical detail: When reconnecting to a different RMS instance (load balancer routes to a new server), that RMS does not have the viewer's state. The shared Redis cache (or a quick DB read) provides the backfill data.
Thundering herd mitigation: If many viewers reconnect simultaneously, exponential backoff with jitter spreads the reconnection wave. Server-side: send a "go away" message before shutdown so clients reconnect before the server dies (connection draining).
Redis Pub/Sub Node Dies
Impact: Comments stop flowing in real-time for affected video channels.
Recovery: 1. Redis Sentinel or Redis Cluster promotes a replica. 2. RMS instances reconnect and resubscribe within seconds. 3. Comments posted during the outage are still in the database -- viewers can fetch them via the catch-up endpoint on reconnect.
Why this is acceptable: Redis Pub/Sub is fire-and-forget by design. Comments are persisted to the database independently. The pub/sub layer is an optimization for real-time delivery, not the source of truth. Losing a few seconds of real-time delivery during a Redis failover is a minor UX blip.
Viral Stream Overwhelms a Single RMS
Impact: One video's viewers are disproportionately concentrated on a few RMS instances (even with consistent hashing, a video with 5M viewers will land on many servers). The fan-out for each comment overwhelms those servers.
Recovery: 1. Monitoring detects comment velocity exceeding threshold. 2. System switches from Tier 1 (push) to Tier 2 (sampling) or Tier 3 (CDN pull). 3. RMS instances for that video reduce their fan-out load immediately. 4. Viewers experience a brief transition period where comment frequency changes (sampling kicks in) or there is a 1-2 second latency increase (CDN mode).
Comment Database Hot Partition
Impact: Writes to a viral video's partition are throttled (DynamoDB) or slow (Cassandra).
Recovery: 1. Write sharding (random suffix on partition key) distributes load. 2. Redis hot store absorbs burst writes immediately. 3. Background process drains Redis to durable storage at a rate the database can handle.
Level Expectations
| Level | What You Must Cover | What Sets You Apart |
|---|---|---|
| Mid-Level | SSE for server-to-client push (with reasoning over WebSocket), separate Comment Service from connection servers, database with video_id as partition key, basic reconnection handling | Mentions cursor-based pagination for historical comments, discusses the HTTP/1.1 6-connection limit |
| Senior | Redis Pub/Sub (with reasoning against Kafka), channel-per-video strategy with co-location via consistent hashing, hybrid fan-out strategy (push for small / CDN for mega), backfill endpoint with dedup, rate limiting | Calculates fan-out math showing why push fails at 500K viewers, designs the Tier 1 / Tier 2 / Tier 3 switching with hysteresis, explains hot partition mitigation |
| Staff+ | Meta's "Write Locally Read Globally" inversion, CDN snapshot approach with client-side animation smoothing, shadow banning architecture, War and Peace math showing why sampling is inevitable, async content moderation with retroactive removal | Mentions the Dispatcher alternative (LinkedIn's approach), discusses buffer management for slow consumers, explains threshold hysteresis to prevent mode flapping, knows that CDN pull latency is acceptable because video latency is already 5-15 seconds |
References from Our Courses
- Redis Data Structures and Use Cases — Pub/Sub for channel-per-video real-time delivery
- Kafka Partitions and Ordering — ordered comment ingestion at scale
- Traffic Control — rate limiting and backpressure for comment floods
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.