Skip to content

Design an Online Auction Platform

TL;DR

Build a system where users bid on items, the highest bidder wins, and the whole thing does not fall apart when 50,000 people bid on the same item in the last 30 seconds. The defining mechanic is proxy bidding -- eBay's innovation where you enter your maximum willingness to pay, and the system bids on your behalf in minimum increments. This means the "current price" is not the highest bid; it is the second-highest bid plus one increment. The architecture challenge is the single-writer pattern for hot auctions: all bids on one item must be serialized through a single node to maintain bid ordering correctness. You cannot shard a single auction across multiple writers without risking bid ordering violations. Anti-sniping mechanisms, post-auction payment flows, and the Vickrey (second-price) auction model round out a design that eBay handles at 1.7 billion live listings and $74 billion in annual GMV.


The System

Think eBay. A seller lists a vintage Rolex with a starting price of $500 and a 7-day auction window. Bidders place bids over the week. In the final 30 seconds, a flurry of last-second bids comes in -- "sniping." The auction closes. The highest bidder wins and must pay. The seller ships the item. eBay takes a 12.9% commission.

What makes auctions fundamentally different from fixed-price sales? Time pressure and contention. A fixed-price sale has exactly one state transition: available -> sold. An auction has continuous state transitions as bids come in, and the final state is determined by a deadline. The system must handle real-time bid updates, enforce bid validity rules, determine the winner atomically at auction close, and handle all the ways a buyer or seller can misbehave after the auction ends. eBay processes about 1.7 billion live listings at any time, with millions of bids per day. Most listings have zero or few bids. A small percentage -- the hot items -- receive thousands of bids, concentrated in the final minutes.


Requirements

Functional Requirements

Requirement Details
List item Seller creates auction with starting price, reserve price (optional), duration, and item details.
Place bid Bidder submits a maximum bid amount. System places proxy bids automatically.
Proxy bidding System bids on behalf of the user up to their maximum, in minimum increments.
Real-time updates All viewers see current price update within 2 seconds of a new bid.
Auction close At deadline, determine winner. Handle anti-snipe extension if configured.
Post-auction Winner pays. Seller ships. System handles non-payment, disputes, returns.
Buy It Now Optional fixed-price purchase that ends the auction immediately.

Non-Functional Requirements

Requirement Target
Bid acceptance latency < 500 ms (user sees "bid accepted" or "outbid")
Bid ordering correctness No bid can be accepted after a higher valid bid. Zero ordering violations.
Auction close accuracy Winner determined within 1 second of auction end time.
Availability 99.9%
Scale 1.7B listings, 50M bids/day, 500 bids/sec on a single hot auction

Back-of-Envelope Math

Total live listings:         1.7 billion
Bids per day:                50 million
Bids per second (avg):       ~580
Bids per second (peak):      ~3,000

Bid distribution:
  90% of listings:           0-2 bids total (low interest)
  9% of listings:            3-50 bids (moderate interest)
  1% of listings:            50-10,000+ bids (hot items)

Hot auction profile:
  Duration:                  7 days
  Total bids:                5,000
  Bids in last hour:         3,000 (60% of all bids)
  Bids in last minute:       500
  Bids in last 10 seconds:   100 (10 bids/sec on one item)

Storage:
  Bid record:                (bid_id, auction_id, user_id, max_amount, current_bid, timestamp) = ~100 bytes
  50M bids/day * 100 bytes = 5 GB/day = 1.8 TB/year
  Listing metadata:          ~2 KB per listing * 1.7B = 3.4 TB

The key insight: 99% of auctions have near-zero contention and can use simple database writes. The 1% hot auctions have extreme write contention on a single row (the current bid). Your design must handle both.


Naive Design

PostgreSQL database, REST API.

Schema:

CREATE TABLE auctions (
    auction_id BIGINT PRIMARY KEY,
    seller_id BIGINT,
    title TEXT,
    starting_price DECIMAL,
    reserve_price DECIMAL,
    current_price DECIMAL,
    current_winner_id BIGINT,
    highest_max_bid DECIMAL,    -- proxy bidding: the actual max bid
    end_time TIMESTAMP,
    status VARCHAR(20)          -- ACTIVE, ENDED, SOLD, UNSOLD
);

CREATE TABLE bids (
    bid_id BIGINT PRIMARY KEY,
    auction_id BIGINT,
    user_id BIGINT,
    max_amount DECIMAL,         -- user's maximum willingness to pay
    actual_amount DECIMAL,      -- what the proxy system actually bid
    created_at TIMESTAMP
);

Bid placement:

BEGIN;
SELECT current_price, highest_max_bid, current_winner_id
FROM auctions WHERE auction_id = :auction FOR UPDATE;

-- Proxy bidding logic (in application code):
-- If new_max > highest_max_bid:
--   new_current_price = highest_max_bid + increment
--   new_highest_max = new_max
--   new_winner = new_bidder
-- If new_max <= highest_max_bid:
--   new_current_price = new_max + increment (capped at highest_max_bid)
--   winner unchanged

UPDATE auctions SET
    current_price = :new_current_price,
    highest_max_bid = :new_highest_max,
    current_winner_id = :new_winner
WHERE auction_id = :auction;

INSERT INTO bids (...) VALUES (...);
COMMIT;

This works. For most auctions, it works fine. The problem is hot auctions.


Where It Breaks

Problem 1: Row-Level Lock Contention on Hot Auctions

SELECT ... FOR UPDATE on the auction row serializes all bids. At 10 bids/second on a hot auction, each bid holds the lock for ~50ms (application logic + DB write). That is 500ms of lock time per second -- 50% lock saturation. At 20 bids/sec, you exceed 100% and bids start queuing. Response times balloon. Users see timeouts.

Problem 2: Proxy Bidding Creates Hidden Complexity

Proxy bidding means the system bids on behalf of a previous bidder. When user B bids $100 and user A's proxy max is $150, the system must:

  1. Reject B's bid as the winner (A's proxy outbids them).
  2. Set the current price to $105 (B's bid + increment).
  3. Notify B: "You've been outbid."
  4. Do NOT reveal A's max bid ($150) to anyone.

This logic must be atomic. If it is not, a race condition could reveal A's max bid or set the price incorrectly.

Problem 3: Auction Sniping

Sniping is bidding in the last 1-3 seconds before auction close. The sniper waits until no one can respond, then places a bid just above the current price. This is a legitimate strategy on eBay, but it frustrates sellers (items sell below true market value) and other bidders.

Problem 4: Auction Close Is a Thundering Herd

Thousands of auctions end every minute. At the close time, the system must atomically determine the winner, update the auction status, and initiate the payment flow. If this is done by a cron job scanning for ended auctions, the batch size creates spiky load.

Problem 5: Post-Auction Failure Modes

The winner does not pay. The seller does not ship. The item is counterfeit. These are not edge cases -- eBay reports that ~2% of auctions have post-sale disputes. The system must handle non-payment (relist the item, give to second-highest bidder), shipping disputes, and returns.


Real Design

Online Auction — Online Auction High-Level Design

Architecture Overview

┌──────────────┐     ┌──────────────┐
│  Web/Mobile  │     │  WebSocket   │
│  Client      │────▶│  Gateway     │ (real-time bid updates)
└──────┬───────┘     └──────┬───────┘
       │                    │
┌──────┴───────┐     ┌──────┴───────┐
│  API Gateway │     │  Notification │
│  (REST)      │     │  Service      │
└──────┬───────┘     └──────────────┘
┌──────┴───────┐
│  Bid Service │ ── routes to correct auction owner
└──────┬───────┘
┌──────┴───────────────────────────┐
│  Auction Servers (sharded)       │
│  Each server "owns" a set of     │
│  auctions. Single-writer per     │
│  auction. In-memory bid state.   │
└──────┬───────────────────────────┘
┌──────┴───────┐     ┌──────────────┐
│  Bid Log     │     │  Auction DB   │
│  (Kafka)     │     │  (PostgreSQL) │
└──────────────┘     └──────────────┘

Component 1: Single-Writer Pattern for Hot Auctions

This is the defining architectural decision. Each auction is owned by exactly one server process. All bids for that auction are routed to that server. The server processes bids sequentially -- no locks needed because there is no concurrency within a single auction.

Why single-writer?

Bid ordering must be strict. If user A bids $100 at T1 and user B bids $105 at T2 (T2 > T1), user B must win. With distributed writers, you risk clock skew (is T1 < T2?) and race conditions (both writers see the same current price and both accept). A single writer eliminates these problems entirely.

How it works:

  1. Consistent hashing maps auction_id to an auction server.
  2. The bid service routes each bid to the correct server.
  3. The auction server holds the current state in memory: (current_price, highest_max_bid, winner_id).
  4. Bids are processed sequentially from an in-memory queue. Processing a bid takes ~10 microseconds (just comparisons and updates).
  5. After processing, the bid is written to Kafka (durability) and the result is returned to the client.

Throughput: A single auction server can process 100,000 bids/sec for a single auction (limited by memory bandwidth, not CPU). The hottest eBay auctions see ~500 bids/sec. Single-writer handles this with no sweat.

Failover: If the auction server crashes, the consistent hash ring reassigns the auction to another server. That server replays the bid log from Kafka to reconstruct the current state. Replay of 5,000 bids takes < 100 ms. During failover (typically 5-10 seconds), bids for that auction return a "temporarily unavailable" error.

Component 2: Proxy Bidding Engine

Proxy bidding is eBay's core mechanic. Here is how it works precisely.

Scenario: Auction for a watch, starting at $100, increment $5.

  1. User A bids max $200. Current price = $100 (starting price). A is winning.
  2. User B bids max $150. System proxy-bids for A up to $155 ($150 + increment). Current price = $155. A is still winning. B sees "You've been outbid."
  3. User C bids max $300. System proxy-bids for C up to $205 ($200 + increment). Current price = $205. C is winning. A sees "You've been outbid."

The rules:

On new bid (bidder, max_amount):
  if max_amount <= current_price:
    REJECT "Your bid must exceed $current_price"

  if max_amount > highest_max_bid:
    new_current_price = max(highest_max_bid + increment, starting_price)
    new_winner = bidder
    new_highest_max = max_amount
  else:
    new_current_price = max_amount + increment
    // Current winner stays, but price rises to new bidder's max + increment
    // (capped at current winner's max)
    new_current_price = min(new_current_price, highest_max_bid)
    new_highest_max = highest_max_bid  // unchanged
    new_winner = current_winner  // unchanged

  Update state:
    current_price = new_current_price
    highest_max_bid = new_highest_max
    current_winner = new_winner

Privacy: The highest_max_bid is NEVER revealed to other bidders. They only see current_price. If A's max is $200 and the current price is $155, other bidders do not know how much more A is willing to pay. This is fundamental to eBay's auction mechanism.

Tie-breaking: If two bids arrive with the same max amount, the earlier bid wins. This is another reason single-writer matters -- the arrival order at the single writer determines the winner. No clock synchronization issues.

Component 3: Anti-Snipe Mechanisms

eBay does not extend auction deadlines (they allow sniping as a feature). But many other platforms do.

eBay approach: Auctions end at the scheduled time, period. If you snipe, you snipe. eBay's proxy bidding is the counter-mechanism -- if the current leader has a high proxy max, sniping with a low bid will not work.

Extension approach (used by many platforms):

If a bid is placed within 5 minutes of auction end:
  Extend the auction by 5 minutes from the bid timestamp.
  Maximum 3 extensions.
  Total possible extension: 15 minutes.

Implementation: The auction server checks if bid.timestamp > auction.end_time - 5_minutes and updates auction.end_time = bid.timestamp + 5_minutes. Since the auction server is single-writer, this is a simple in-memory update.

Trade-off: Extensions prevent sniping but make auction end times unpredictable. Users cannot plan around a fixed end time. eBay's position: proxy bidding makes sniping less effective, so extensions are unnecessary.

Component 4: Auction Close and Winner Determination

Auctions end at a specific time. The system must determine the winner atomically.

Scheduled close:

  1. A timer service (or the auction server itself) fires at auction.end_time.
  2. The auction server stops accepting bids for that auction.
  3. The winner is already known (current_winner from the in-memory state).
  4. The final state is written to PostgreSQL: status = ENDED, winner = current_winner, final_price = current_price.
  5. A "auction ended" event is published to Kafka, triggering:
  6. Email to the winner: "You won! Pay within 48 hours."
  7. Email to the seller: "Your item sold for $X."
  8. Email to all other bidders: "You did not win."

Timer precision: Auction end times must be accurate to within 1 second. The auction server uses a priority queue (min-heap) of end times, checked every 100ms. When now >= end_time, close the auction.

Batch close handling: If 1,000 auctions end at the same minute (common for 7-day auctions listed at the same time), the server processes them sequentially. At ~100 microseconds per close, 1,000 closes take 100 ms. No bottleneck.

Component 5: Vickrey (Second-Price) Auction Variant

eBay's proxy bidding is effectively a Vickrey auction: the winner pays the second-highest bid plus one increment, not their maximum bid.

Why This Matters for System Design

The system stores two values -- current_price (what the winner will pay) and highest_max_bid (the winner's true maximum). These are different. The public-facing value is current_price. The internal value is highest_max_bid. Conflating them is a common design error.

Sealed-bid variant: Some platforms (art auctions, government contracts) use sealed bids where bidders submit once and cannot see other bids. This is simpler to implement (no proxy bidding, no real-time updates) but requires a trusted system that does not reveal bids before the close.

Component 6: Post-Auction Payment Flow

The auction ending is not the end of the transaction. It is the beginning of the payment and fulfillment process.

Payment timeline:

  1. T+0 (auction ends): Winner notified. Payment due within 48 hours.
  2. T+4 hours: First payment reminder.
  3. T+24 hours: Second payment reminder.
  4. T+48 hours: If unpaid, "non-paying buyer" strike. Offer to second-highest bidder ("Second Chance Offer"). Seller can relist.
  5. T+48 hours (payment received): Seller notified. Must ship within 3 business days.
  6. T+delivery: Buyer confirms receipt or opens dispute.

Escrow pattern: The platform holds the payment until the buyer confirms receipt (or a dispute window closes). This protects both parties:

  • Buyer is protected: if the item does not arrive or is counterfeit, the payment is refunded.
  • Seller is protected: once the buyer confirms or the dispute window closes, the payment is released.

Implementation: Use a state machine for each transaction:

AUCTION_ENDED -> PAYMENT_PENDING -> PAYMENT_RECEIVED -> 
SHIPPED -> DELIVERED -> COMPLETED
                                  -> DISPUTED -> RESOLVED
PAYMENT_PENDING -> PAYMENT_OVERDUE -> NON_PAYMENT -> RELISTED

Each state transition is logged, timestamped, and triggers notifications. Background jobs check for stuck transactions (e.g., PAYMENT_PENDING for > 48 hours) and advance the state machine.


Deep Dives

Flash Sale Inventory

Deep Dive 1: Bid Log and Event Sourcing

The single-writer pattern naturally fits event sourcing. Instead of storing the "current state" of an auction, store the ordered sequence of bids.

Bid log (Kafka topic, partitioned by auction_id):

Partition 42 (auction_id = 12345):
  [bid_1: user_A, max=$200, t=T1]
  [bid_2: user_B, max=$150, t=T2]
  [bid_3: user_C, max=$300, t=T3]

Benefits:

  1. Audit trail: Every bid is recorded in order. Regulators, dispute resolution, and fraud investigation can replay the exact sequence.
  2. State reconstruction: If the auction server crashes, replay the bid log to reconstruct the current state. Deterministic replay guarantees identical results.
  3. Analytics: Bidding patterns (when bids cluster, how prices escalate) drive product decisions.
  4. Debugging: When a user claims "I bid $500 but the system said I was outbid," you can replay the log and prove the sequence.

Compaction: Kafka topic retention is 30 days. After auction close, the bid log is archived to S3 (cold storage) for compliance. The topic is compacted to retain only the final state per auction.

Why keep OCC as a safety net: Since Kafka is partitioned by auctionId with a single consumer per partition, bids are processed sequentially -- making optimistic concurrency control theoretically unnecessary. However, keep OCC on the database write path: during consumer group rebalancing, two consumers can briefly process the same partition. OCC catches this edge case with zero cost in the happy path. Belt and suspenders.

Deep Dive 2: Real-Time Bid Updates at Scale

When someone bids on a hot auction, 10,000 watchers need to see the price update within 2 seconds.

Architecture:

  1. After processing a bid, the auction server publishes a BidPlaced event to a pub/sub system (Redis Pub/Sub or a dedicated WebSocket broadcast service).
  2. WebSocket gateway servers subscribe to auction-specific channels.
  3. When an event arrives, the gateway broadcasts to all connected clients watching that auction.

Fan-out math:

Hot auction watchers:        10,000 concurrent connections
Bids per second on hot item: 10
Messages per second:         10 * 10,000 = 100,000
Message size:                ~200 bytes (auction_id, current_price, winner_name, bid_count)
Bandwidth:                   100K * 200B = 20 MB/sec (manageable for a WebSocket cluster)

Optimization: Do not push every bid individually. Batch updates: aggregate bids over a 500ms window and push a single update with the latest state. This reduces message count by 5x at the cost of 500ms additional latency (acceptable for a "real-time" display).

Fallback: For clients that do not support WebSockets (some corporate firewalls block them), fall back to long polling every 3 seconds.

Deep Dive 3: Fraud and Shill Bidding Detection

Shill bidding is when the seller (or an accomplice) bids on their own item to inflate the price. This is illegal on most platforms and violates eBay's terms.

Detection signals:

  1. Bidder-seller relationship: Bidder and seller share an IP address, shipping address, or payment method. Flag for review.
  2. Bidder wins but never pays: A shill bidder who accidentally "wins" will not pay. High non-payment rate from a bidder across multiple auctions is suspicious.
  3. Bidding patterns: A bidder who always bids on the same seller's items but never wins is likely a shill.
  4. Retracted bids: Shill bidders sometimes retract bids just before auction close (to avoid winning). High retraction rate is a signal.

System design implications:

  • Bid history is stored permanently for forensic analysis.
  • A background ML pipeline processes bid patterns and flags suspicious accounts.
  • Flagged auctions get human review before payment is released to the seller.
  • The fraud detection system is asynchronous -- it does not slow down bid processing.

Alternative Designs

Approach Pros Cons When to Use
Single-writer + Kafka (described above) Perfect bid ordering. Event sourcing for audit. Fast failover via replay. Hot auction bottleneck on one server. Consistent hashing adds routing complexity. eBay-scale with hot auctions.
PostgreSQL with SERIALIZABLE isolation Strong consistency out of the box. No custom infrastructure. SERIALIZABLE is 5-10x slower than READ COMMITTED. Hot rows cause massive retry storms. Small platforms, < 1M listings.
Redis sorted set for bid leaderboard Fast reads for "top bids." Sub-millisecond updates. Does not support proxy bidding logic natively. Need application-level atomicity. Simple highest-bid-wins auctions without proxy bidding.
CQRS (Command Query Responsibility Segregation) Separate read model (fast queries for bid history, leaderboard) from write model (bid processing). Scales reads independently. Eventual consistency between write and read models. Added complexity. When read traffic (auction watchers) is 100x write traffic (bidders).
Distributed lock per auction Allows any server to process bids for any auction. Lock acquisition adds 5-10ms latency per bid. Network partition can cause split-brain. Moderate scale, when operational simplicity outweighs performance.

Scaling Math Verification

Auction Server Sizing

Total live auctions:           1.7 billion
Auctions with active bids:     ~10 million (most listings have zero bids)
In-memory state per auction:   ~200 bytes (prices, winner, bid count)
Total in-memory state:         10M * 200 = 2 GB (fits on one server, but shard for availability)

Auction servers:               20 servers (consistent hash ring)
Auctions per server:           500K active auctions
Bids per second (total):       3,000 peak
Bids per server:               150/sec avg (but hot auctions spike one server to 500/sec)
Single-server capacity:        100,000 bids/sec (CPU-bound on proxy logic)
Headroom:                      200x (even with hot auction spikes)

Kafka Bid Log

Bids per day:                  50 million
Bid event size:                ~200 bytes
Daily volume:                  50M * 200B = 10 GB/day
Kafka partitions:              100 (partitioned by auction_id hash)
Retention:                     30 days = 300 GB
Throughput:                    3,000 events/sec peak (one Kafka broker handles 100K/sec)

WebSocket Gateway

Concurrent auction watchers:   500,000 (across all auctions)
WebSocket connections:          500K
Messages per second (all):     50,000 (batched updates)
Gateway servers:               10 (50K connections each, commodity hardware)
Memory per connection:          ~10 KB (connection state + buffers)
Total gateway memory:           500K * 10KB = 5 GB (spread across 10 servers)

Failure Analysis

Failure Impact Mitigation
Auction server crashes Auctions owned by that server temporarily cannot accept bids. Consistent hash ring reassigns to backup. Replay bid log from Kafka (< 1 sec for most auctions). Failover time: 5-10 seconds.
Kafka goes down New bids are processed in memory but not durably logged. Risk of data loss on server crash. Kafka cluster with 3x replication. If all Kafka brokers fail, auction servers buffer bids in local write-ahead log. Replay when Kafka recovers.
Clock skew between auction server and timer service Auction closes too early or too late. Use a single authoritative clock (the auction server's clock, synced via NTP). Auction close is triggered by the auction server, not an external timer.
Database (PostgreSQL) write fails Final auction results not persisted. Retry with exponential backoff. Kafka bid log is the source of truth during outage. Reconciliation job reconstructs DB state from Kafka after recovery.
WebSocket gateway overloaded Users see stale prices. Add more gateway servers (stateless, horizontal scaling). Degrade to polling for excess connections.
Shill bidding goes undetected Inflated prices. Buyer overpays. Loss of platform trust. ML detection pipeline. Manual review for high-value auctions. Post-auction price comparison with market value.
Non-paying winner Seller has to relist. Lost time. Second Chance Offer to runner-up. Non-payment strike system (3 strikes = account suspension). Require payment method on file before bidding.

Level Expectations

Level What the Interviewer Expects
Mid (L4) Database schema with auctions and bids tables. Simple "highest bid wins" logic. SELECT FOR UPDATE for concurrency. Knows about the contention problem but does not solve it cleanly. Basic notification on auction end.
Senior (L5) Proxy bidding with correct price calculation (second-price mechanics). Single-writer pattern for hot auctions. Kafka for bid durability and event sourcing. Anti-snipe extensions. Post-auction payment flow with state machine. Real-time WebSocket updates.
Staff+ (L6) Vickrey auction analysis and why proxy bidding implements it. Event sourcing with deterministic replay for failover. Shill bidding detection as a system concern. Consistent hashing for auction-to-server mapping. Batched WebSocket updates for efficiency. Post-auction escrow pattern. Reference to eBay's actual architecture and scale numbers.

References from Our Courses


Red Team This Design

Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.

Attack: Design an Online Auction Platform →