Skip to content

Building a Market Order Book

Level: L3-L5 Topics: Sorted Data Structures, Heaps, Balanced BSTs, System Design

Problem Statement

You are building a marketplace for a single product "ABC". Sellers enter the marketplace by posting an offer with a price (in the range $0.01 to $100.00) and a timestamp (when they entered).

Design an order book that supports two operations:

  1. insert_order(seller_id, price, timestamp) -- Add a new sell order to the book.
  2. get_lowest_seller_id() -- Return the seller_id of the cheapest available order. If multiple sellers have the same price, return the one who entered earliest (lowest timestamp).

Background & Constraints

  • Prices are in the range $0.01 to $100.00, with two decimal places (i.e., prices are in cents from 1 to 10000).
  • Timestamps are unique across all sellers.
  • The order book may contain thousands to millions of active orders.
  • Both operations should be as fast as possible.

Examples

insert_order("seller_A", 5.00, 100)
insert_order("seller_B", 3.50, 200)
insert_order("seller_C", 3.50, 150)

get_lowest_seller_id() → "seller_C"
  (seller_C and seller_B both have price 3.50, but seller_C has earlier timestamp 150)
insert_order("seller_D", 3.50, 250)

get_lowest_seller_id() → "seller_C"
  (still seller_C -- earliest at the lowest price)

Hints & Common Pitfalls

Approach 1: Brute Force -- Unsorted List

  • insert_order: O(1) -- just append.
  • get_lowest_seller_id: O(N) -- scan all orders.
  • Simple but slow for frequent queries.

Approach 2: Min-Heap

  • insert_order: O(log N) -- heap insertion with key (price, timestamp).
  • get_lowest_seller_id: O(1) -- peek at the top of the heap.
  • Good general-purpose approach.

Approach 3: Sorted Map / Balanced BST

  • Use a tree map with key (price, timestamp).
  • insert_order: O(log N).
  • get_lowest_seller_id: O(1) -- look at the minimum key.
  • Also supports efficient deletion (needed for follow-ups).

Approach 4: Bucket Array (Exploiting Price Range)

  • Since prices have only 10,000 possible values (cents from 1 to 10000), create an array of 10,000 buckets. Each bucket holds a queue of seller IDs sorted by timestamp.
  • insert_order: O(1) -- append to the bucket for this price.
  • get_lowest_seller_id: O(1) amortized -- maintain a pointer to the lowest non-empty bucket.
  • This is the optimal approach given the bounded price range.

Follow-Up Questions

  1. Remove impatient sellers: Add an operation remove_order(seller_id) for sellers who get tired of waiting and leave. How does each approach handle deletion? Which data structure makes removal efficient?

  2. Extend to any price and quantity: Remove the \(0.01-\)100.00 constraint. Prices can be any positive decimal. Each order also has a quantity. get_lowest_seller_id() now also returns the quantity. How does this change your design?

  3. Matching engine: Add a buy_order(buyer_id, max_price, quantity) operation that matches the buyer with the cheapest available sell orders up to max_price, consuming orders from the book. How do you implement this efficiently?

  4. Concurrency: If multiple threads are inserting and querying simultaneously, how do you ensure thread safety? What locking strategy minimizes contention?

  5. Persistence: If the system crashes, how do you recover the order book? What logging or snapshotting strategy would you use?