Design a Flash Sale System
TL;DR
Build a system that sells 10,000 units of a product in 60 seconds to 10 million simultaneous users without overselling, crashing, or melting. The mantra is "Reject early, reject cheap" -- Grab's engineering principle that you should shed load at the cheapest possible layer (CDN > load balancer > application > database) before it reaches expensive, contention-prone resources. Redis DECR provides atomic inventory counting that handles 100K+ operations/sec on a single node. Amazon does NOT auto-scale for Prime Day -- they pre-provision capacity weeks in advance because auto-scaling is too slow for a thundering herd. Cell-based architecture isolates failures so that a bug in the checkout flow does not take down product browsing. The flash sale is a ticket booking problem, a payment problem, and a rate-limiting problem all at once, happening in 60 seconds.
The System
Amazon Prime Day. Alibaba Singles' Day (11.11). A $999 laptop is listed at $399 for exactly 60 seconds. 10,000 units available. 10 million users are refreshing the page. At the sale start time, 10 million clicks hit your servers. You have 10,000 units and 10 million buyers. The other 9,990,000 must be rejected instantly without degrading the experience for the 10,000 who will actually buy.
How is this different from a normal e-commerce sale? Normal sales have gradual traffic. Flash sales have a step function -- zero to maximum in one second. Auto-scaling needs 2-5 minutes to respond. By the time new servers boot, the sale is over. You also cannot tolerate "out of stock" errors for users who think they clicked fast enough. If 10,000 units are available and 10,001 users click "buy," exactly one user must be told "sold out." Not zero users, not two users. Exactly one. This is an inventory consistency problem under extreme concurrency.
Requirements
Functional Requirements
| Requirement | Details |
|---|---|
| Flash sale creation | Merchant configures product, discount price, inventory count, start/end time. |
| Countdown page | Users see a countdown timer before the sale starts. |
| Purchase attempt | At sale start, user clicks "Buy Now." System validates inventory and reserves a unit. |
| Checkout | Reserved user has 5 minutes to complete payment. Unit released on timeout. |
| Inventory tracking | Real-time count of remaining units displayed on the page. |
| Order confirmation | Successful purchaser receives confirmation email and order record. |
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Page load during sale | < 1 second (even under 10M concurrent users) |
| Purchase attempt response | < 500 ms |
| Inventory consistency | Zero overselling. Zero underselling after release. |
| Availability | 99.9% (graceful degradation acceptable for non-buyers) |
| Scale | 10M concurrent users, 500K requests/sec, sell-out in 60 seconds |
Back-of-Envelope Math
Concurrent users: 10 million
Requests/sec at sale start: 10M users * 1 click = 10M requests in first second
Staggered by network latency: ~5M requests/sec peak
Inventory: 10,000 units
Successful purchases: 10,000
Rejection rate: 99.9% (9,990,000 rejections)
Page loads (countdown page):
Before sale: 10M page loads in 5 min = 33K/sec
Static HTML + JS: 100 KB per load
CDN bandwidth: 33K * 100 KB = 3.3 GB/sec (CDN handles this trivially)
"Buy Now" API requests:
Peak: 5M/sec (first second)
After inventory depletes: rapid decline (sold out message shown)
Total API requests in 60 sec: ~15-20M
Inventory state:
Single Redis key: flash_sale:{sale_id}:inventory = 10000
Redis DECR operations: up to 5M/sec (one per purchase attempt)
Redis single-node limit: ~200K ops/sec
Redis cluster with 30 nodes: 6M ops/sec (sufficient)
The key insight: 99.9% of users will not get the product. Your system's primary job is rejecting 9,990,000 users as quickly and cheaply as possible, not processing 10,000 orders.
Naive Design
Everything goes through the application server and database.
Flow:
1. User clicks "Buy Now."
2. Application server receives request.
3. BEGIN transaction.
4. SELECT inventory FROM products WHERE product_id = :id FOR UPDATE;
5. IF inventory > 0:
UPDATE products SET inventory = inventory - 1;
INSERT INTO orders (...);
COMMIT;
Return "Success!"
6. ELSE:
ROLLBACK;
Return "Sold out."
With 5M requests per second hitting this endpoint, here is what happens: the database receives 5 million SELECT FOR UPDATE queries in one second, all competing for the same row lock on the same product. PostgreSQL handles maybe 10,000 transactions per second under ideal conditions. The connection pool is exhausted in 2ms. The query queue grows to 5 million entries. The database OOMs. Everything dies.
Where It Breaks
Problem 1: The Database Is Not Designed for This
Relational databases are designed for thousands of transactions per second, not millions. FOR UPDATE row locks serialize access, creating a single-threaded bottleneck. At 5M requests/sec, the database is dead before the first transaction commits.
Problem 2: Auto-Scaling Is Too Slow
AWS auto-scaling takes 2-5 minutes to launch new EC2 instances. The flash sale is over in 60 seconds. By the time new servers are ready, there is no work left to do. You cannot react to a thundering herd with auto-scaling.
Problem 3: Application Servers Are Overwhelmed
Even if the database could handle it, 5M HTTP requests/sec requires ~5,000 application servers (assuming 1,000 req/sec per server). If you only have 500 pre-provisioned, the load balancer drops 90% of connections. Users see "502 Bad Gateway."
Problem 4: Payment Processing Bottleneck
Even the 10,000 successful buyers must complete payment. Stripe/Adyen handles ~500 payments/sec per merchant. If 10,000 buyers all try to pay simultaneously, the payment gateway queues them for 20 seconds. But the reservation timeout is 5 minutes, so this is actually fine -- the bottleneck is upstream, not at payment.
Problem 5: Inventory Display Is Stale
If the page shows "8,342 remaining" but the actual count is 5,100 because updates are propagating, users make decisions based on stale data. Showing "0 remaining" when units are actually available causes underselling (users give up).
Real Design

Architecture Overview
┌──────────────────────┐
│ CDN (CloudFront) │
│ - countdown page │
│ - "sold out" page │
│ - JS/CSS/images │
│ - CDN-level reject │
└──────────┬───────────┘
│ (only "Buy Now" API calls pass through)
┌──────────┴───────────┐
│ Virtual Waiting Room │
│ (rate limit entry) │
└──────────┬───────────┘
│ (controlled flow: 5K-10K/sec)
┌──────────┴───────────┐
│ API Servers │
│ (pre-provisioned) │
└──────────┬───────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────────┴──┐ ┌────────┴────────┐ ┌───┴──────────┐
│ Inventory │ │ Order Service │ │ Payment │
│ Service │ │ (Kafka queue) │ │ Service │
│ (Redis │ │ │ │ │
│ DECR) │ │ │ │ │
└─────────────┘ └─────────────────┘ └──────────────┘
Component 1: "Reject Early, Reject Cheap"
This is Grab's engineering principle, and it defines the entire architecture. Every layer rejects requests that should not proceed.
Layer 1: CDN (cheapest rejection)
Before the sale starts, the page is static HTML served from CDN. Zero backend load. After the sale ends (inventory = 0), switch the CDN to serve a static "Sold Out" page. Any request to the API after sell-out gets a CDN-level rejection (302 redirect to the "Sold Out" page, or a cached 200 response with sold-out JSON). CDN can handle 10M+ requests/sec effortlessly.
How to trigger the CDN switch: When inventory hits 0, the API server sets a flag in a distributed config store (Redis, etcd). A Lambda@Edge function checks this flag on every CDN request. If the flag is set, return the cached "Sold Out" response without forwarding to origin.
Layer 2: Load balancer (cheap rejection)
For requests that make it past CDN, the load balancer enforces:
- Rate limit: 1 purchase attempt per user per 5 seconds (based on user ID in JWT).
- Connection limit: reject new connections when server capacity is at 90%.
- Bot detection: block IPs with > 100 requests/sec (obviously automated).
Layer 3: Application server (moderate cost rejection)
Quick checks before touching any shared state:
- Is the sale still active? (in-memory flag, updated by event)
- Has this user already purchased? (in-memory Bloom filter or Redis SET check)
- Is the user's session valid? (JWT verification, CPU-only, no network call)
Layer 4: Redis inventory check (the real decision)
Only requests that pass all previous layers reach the inventory service. This is the single point of truth for "is there inventory remaining?"
Component 2: Redis Atomic DECR for Inventory
Redis DECR is the perfect primitive for flash sale inventory.
Setup:
On purchase attempt:
new_count = DECR flash_sale:{sale_id}:inventory
if new_count >= 0:
// Success! This user got a unit.
// Proceed to reservation and payment.
elif new_count < 0:
// Inventory exhausted. Undo the decrement.
INCR flash_sale:{sale_id}:inventory
// Return "Sold out."
Why DECR works:
- Atomic: Redis is single-threaded.
DECRis one operation. No race condition possible. Two concurrent DECRs on value 1 produce values 0 and -1, never 0 and 0. - Fast: 200K+ DECR operations per second on a single Redis node.
- Exact: No overselling. If inventory is 10,000 and 10,001 users call DECR, exactly 10,000 get non-negative values and 1 gets -1.
The negative value trick: When DECR returns a negative number, the inventory was already 0. You INCR to undo the decrement (returning the counter to 0) and reject the user. This is a well-known pattern at Alibaba and Amazon.
Redis clustering for higher throughput: If 200K ops/sec is not enough (unlikely for a single sale, but possible during Singles' Day with multiple simultaneous sales), use Redis Cluster with one key per sale. Different sales hash to different nodes. One sale per node = one sale at 200K ops/sec. If you need 500K ops/sec for a single sale, use a Lua script that batches multiple DECRs.
Component 3: Pre-Provisioning Over Auto-Scaling
Amazon does not auto-scale for Prime Day. They pre-provision.
Why? Auto-scaling is reactive. It detects high CPU/memory/connection count and launches new instances. This takes 2-5 minutes (instance boot + application startup + health check). A flash sale's peak load arrives in 1 second and is over in 60 seconds. Auto-scaling would provision new capacity 3 minutes after the sale ended.
Pre-provisioning strategy:
- D-7 (one week before): Estimate traffic from historical data and pre-registration count. Target: 2x expected peak.
- D-1: Spin up all servers. Run load tests against them. Warm up caches (product data, user data, CDN edge caches).
- D-0 (sale day): Servers are running and idle. At sale start, they absorb the traffic spike instantly.
- D+0 (after sale): Scale down over 2 hours (gradual, not immediate, in case of issues).
Cost: 2,000 servers for 4 hours at $0.10/hour = $800. This is trivially cheap compared to the revenue from a flash sale (10,000 * $399 = $3.99M).
Component 4: Cell-Based Architecture
Amazon uses cell-based architecture to isolate failures. The system is divided into independent cells, each handling a subset of traffic. A failure in one cell does not cascade to others.
How cells work for flash sales:
Cell A: handles users A-M (by user ID hash)
Cell B: handles users N-Z (by user ID hash)
Each cell has its own:
- API servers
- Redis inventory shard (with its own pre-allocated inventory)
- Order processing queue
- Payment connection pool
Inventory partitioning: 10,000 units are split across cells. Cell A gets 5,000. Cell B gets 5,000. Each cell's Redis manages its own inventory independently. No cross-cell communication for the purchase path.
What if Cell A sells out before Cell B? Cell A's users see "Sold out" while Cell B still has units. This is slightly unfair but operationally safe. Alternative: when a cell sells out, it redirects overflow to other cells. But this adds cross-cell dependency, which defeats the purpose.
Blast radius: If Cell A's Redis crashes, only 50% of users are affected. Cell B continues operating. Without cells, a single Redis failure would kill the entire sale.
Component 5: Reservation TTL and Payment Flow
After a successful DECR, the user has a reservation but has not paid yet. They have 5 minutes.
Reservation flow:
- DECR returns >= 0. Create a reservation record in Redis:
reservation:{sale_id}:{user_id} = 1, TTL=300. - User proceeds to payment checkout.
- If payment succeeds within 5 minutes: write order to database, remove reservation key, send confirmation.
- If TTL expires (user abandons): Redis deletes the key. A Lua script INCRs the inventory counter.
- The released unit is available for the next user.
Releasing inventory on TTL:
-- Redis keyspace notification subscriber
on_key_expired(key):
if key matches "reservation:*":
sale_id = extract_sale_id(key)
INCR flash_sale:{sale_id}:inventory
-- Update CDN: sale is no longer sold out
Payment pipeline: Use Airbnb's Orpheus pattern (from Lesson 4) with idempotency keys. The payment for a flash sale is identical to any other payment -- the flash sale system's job is to get the inventory reservation right; payment is a downstream concern.
Component 6: Virtual Waiting Room
Same concept as ticket booking (Lesson 1), adapted for flash sales.
Adaptation: Flash sales have products, not seats. The waiting room admits users at a rate the inventory service can handle (5K-10K/sec). Users who do not get a unit are rejected and shown the "Sold out" page.
Key difference from ticket booking: In ticket booking, the waiting room exists because the booking process is slow (seat selection + payment). In flash sales, the purchase decision is instant ("Buy Now" at a predetermined price). The waiting room exists purely to rate-limit the thundering herd so the Redis inventory check does not get overwhelmed.
Optimization: If pre-registration count exceeds inventory by 100x (1M registrations for 10,000 units), run a lottery at sale start time. Randomly select 50,000 winners (5x inventory for abandonment buffer). Only winners are admitted. Everyone else is immediately rejected. This is fairer than first-come-first-served (which rewards fast internet connections) and reduces backend load by 200x.
Deep Dives

Deep Dive 1: Preventing Overselling with Exactly-Once Semantics
The DECR approach guarantees no overselling at the Redis level. But what about the full flow?
Scenario: Double reservation.
- User clicks "Buy Now." Request hits Server A. Server A DECRs (count 9999 -> 9998). Server A writes reservation. Server A returns "Success."
- Network glitch. User does not see response. Clicks again. Request hits Server B. Server B DECRs (count 9998 -> 9997). Server B writes a second reservation for the same user.
- User now has two reservations. If they pay for both, they get two units from one click.
Solution: Before DECR, check if the user already has a reservation.
-- Atomic Lua script:
local res_key = "reservation:" .. sale_id .. ":" .. user_id
local existing = redis.call("GET", res_key)
if existing then
return "ALREADY_RESERVED"
end
local inv_key = "flash_sale:" .. sale_id .. ":inventory"
local new_count = redis.call("DECR", inv_key)
if new_count >= 0 then
redis.call("SET", res_key, "1", "EX", 300)
return "RESERVED"
else
redis.call("INCR", inv_key)
return "SOLD_OUT"
end
This Lua script is atomic. The check + DECR + SET happens as one operation. No window for double reservation.
Deep Dive 2: Inventory Display Without Stampede
Showing the real-time inventory count creates a problem. If you query Redis on every page load (10M users refreshing), that is 10M reads/sec on the inventory key. Redis can handle this (it is a GET, not a write), but you are wasting bandwidth.
Approach: Tiered display accuracy.
Inventory > 1000: Show "In Stock" (no count). Update every 10 seconds.
Inventory 100-1000: Show approximate count (rounded to nearest 100). Update every 5 seconds.
Inventory 1-100: Show exact count. Update every 1 second via WebSocket.
Inventory 0: Show "Sold Out." Serve from CDN.
Implementation: A background job reads the inventory count from Redis every second and publishes it to a Pub/Sub channel. WebSocket gateway servers broadcast to connected clients. The broadcast message is ~20 bytes ("count: 42"). 10M users * 20 bytes/sec = 200 MB/sec WebSocket bandwidth. This is manageable with 40 gateway servers (50 MB/sec each).
Alternative: Do not show the inventory count at all. Just show "Available" or "Sold Out." This is what Amazon does for most Prime Day deals. Simpler, less infrastructure, and avoids creating a countdown anxiety that pushes users to click faster.
Deep Dive 3: Post-Sale Reconciliation
After the sale, verify that the system state is consistent.
Checks:
-
Inventory balance: Initial inventory (10,000) = successful orders + released reservations + current Redis count. If these do not sum to 10,000, something is wrong.
-
Payment reconciliation: Every successful DECR should have a corresponding order. Every order should have a payment. Run:
SELECT COUNT(*) FROM orders WHERE sale_id = :sale AND status = 'CONFIRMED';
-- Should equal: 10000 - current_redis_count - active_reservations
- Duplicate detection: Check for users with multiple orders for the same sale:
Timing: Run reconciliation 1 hour after the sale ends (to allow all reservation TTLs to expire and all payments to process). If discrepancies are found, investigate and resolve within 24 hours.
Alternative Designs
| Approach | Pros | Cons | When to Use |
|---|---|---|---|
| Redis DECR + CDN + waiting room (described above) | Handles 10M users. Zero overselling. "Reject early, reject cheap." | Complex multi-layer architecture. Pre-provisioning required. | Amazon Prime Day, Alibaba Singles' Day, any major flash sale. |
| Database pessimistic locking | Simple. ACID compliant. No external dependencies. | Cannot handle > 5K/sec. Thundering herd kills DB. | Small flash sales (< 1,000 concurrent users). |
| Kafka queue for orders | Absorbs burst. Process in FIFO order. Guaranteed delivery. | Users wait in queue without knowing if they will get the product. Latency is seconds, not milliseconds. | When fairness is more important than speed. |
| Lottery system | Fair. No thundering herd. Simple implementation. | Users cannot "buy immediately." Lottery takes minutes to process. Less exciting UX. | When scale is extreme (100M+ users for 1,000 units). |
| Token-based admission | Pre-distribute tokens (via app notification) to eligible users. Only token holders can buy. | Requires pre-registration. Token distribution is another scalability problem. | Limited-edition product drops (Nike SNKRS, Supreme). |
Scaling Math Verification
CDN Layer
Concurrent users: 10M
Countdown page size: 100 KB (static HTML + JS + CSS)
CDN requests before sale: 10M * 3 loads = 30M requests in 5 min = 100K/sec
CDN bandwidth: 100K * 100 KB = 10 GB/sec (CloudFront handles 100+ Tbps)
CDN cost: 30M * 100 KB = 3 TB * $0.085/GB = $255 (trivial)
"Sold Out" page after inventory depletes:
All subsequent requests (~9.99M) served from CDN
Cost: 9.99M * 10 KB = 100 GB * $0.085/GB = $8.50
Redis Inventory Service
Redis DECR operations: Peak 10K/sec (after waiting room rate limiting)
Redis capacity: 200K ops/sec on one node
Utilization: 5% (massive headroom)
With cell-based architecture (2 cells):
Each cell's Redis: 5K ops/sec
Inventory per cell: 5,000 units
Total Redis memory for sale: < 1 MB (one counter + 10K reservation keys * 100 bytes)
API Server Fleet
Requests past waiting room: 10K/sec (rate-limited)
Per-request processing: 5 ms (Redis check + response)
Requests per server: 1,000/sec
Servers needed: 10 + buffer = 20 pre-provisioned
Pre-provisioning cost: 20 servers * $0.10/hour * 4 hours = $8
Revenue from sale: 10,000 * $399 = $3.99M
Infrastructure cost ratio: 0.0002% of revenue
Payment Processing
Successful purchases: 10,000
Payment completion window: 5 minutes
Payment processing rate: 10,000 / 300 sec = 33/sec
Stripe capacity per merchant: ~500/sec
Utilization: 7% (no bottleneck)
Failure Analysis
| Failure | Impact | Mitigation |
|---|---|---|
| Redis inventory node crashes | Cannot process purchase attempts. Sale is paused. | Redis replication with automatic failover (< 5 sec). Cell isolation: only one cell affected. Alternative: fall back to database-backed inventory check (slower but functional). |
| CDN goes down | 10M users cannot load the page. | Multi-CDN (CloudFront + Fastly). DNS failover. Origin servers can serve static pages directly (but at reduced capacity). |
| Waiting room floods backend | Rate limiting fails. 5M requests/sec hit API servers. | Second rate limit at load balancer level. Circuit breaker in API servers (reject requests when Redis latency > 50ms). Degrade to "try again later" message. |
| Payment service fails during checkout | Users have reservations but cannot pay. Reservations expire. Units are released. | Extend reservation TTL if payment service is degraded. Retry payment. If payment service is fully down, pause the sale and extend all TTLs until it recovers. |
| Inventory count goes negative in Redis | Overselling. More orders than inventory. | Lua script prevents negative (INCR on negative DECR result). Secondary check: database-level unique constraint on order table prevents duplicate fulfillment. |
| Bots grab all inventory | Genuine users get nothing. PR disaster. | CAPTCHA at waiting room entry. Device fingerprinting. Purchase limit per account. IP rate limiting. |
| One cell sells out, other has remaining stock | Unfair distribution. Some users see "Sold out" while others can buy. | After primary cell depletes, redirect overflow to cells with remaining inventory. Adds cross-cell latency (~50ms) but ensures all inventory is sold. |
Level Expectations
| Level | What the Interviewer Expects |
|---|---|
| Mid (L4) | Database with inventory counter and SELECT FOR UPDATE. Knows it does not scale. Mentions caching and queuing. Basic payment flow. |
| Senior (L5) | Redis DECR for atomic inventory. CDN for static page serving. Pre-provisioning vs. auto-scaling trade-off. Reservation with TTL for payment window. "Reject early, reject cheap" principle with multi-layer rejection. Waiting room for rate limiting. |
| Staff+ (L6) | Cell-based architecture with inventory partitioning. Lua script for atomic check+reserve. Post-sale reconciliation pipeline. Lottery vs. FIFO fairness analysis. Tiered inventory display strategy. Reference to Amazon Prime Day and Alibaba Singles' Day real architectures. Quantified cost analysis showing infrastructure cost as fraction of revenue. |
References from Our Courses
- Redis Data Structures and Use Cases — atomic DECR for inventory countdown under load
- Traffic Control — rate limiting and queue-based admission control
- Redis Interview Patterns — distributed locks for preventing overselling
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.