Skip Lists — The Engine Inside Redis Sorted Sets
TL;DR
A skip list is a probabilistic data structure that provides O(log n) search, insert, and delete -- like a balanced BST -- but is dramatically simpler to implement. Each element is assigned a random height via coin flips, creating express lanes above the base linked list. Redis uses skip lists for sorted sets (ZRANGEBYSCORE, ZRANK) because they are simpler than red-black trees, naturally support range queries, and have cache-friendly sequential access. LevelDB and RocksDB use skip lists for their in-memory write buffer (MemTable). If a balanced BST feels like overkill and a linked list feels too slow, a skip list is the answer.
The Problem
You need an ordered data structure that supports:
- Insert in O(log n)
- Delete in O(log n)
- Search in O(log n)
- Range queries: "give me all elements between score 100 and 200"
- Rank queries: "what is the rank of this element?"
A sorted array gives you O(log n) search but O(n) insert. A linked list gives you O(1) insert at a known position but O(n) search. A balanced BST (red-black tree, AVL tree) gives you O(log n) for everything but is complex to implement, hard to debug, and makes range queries awkward (in-order traversal with callbacks).
Skip lists achieve the same O(log n) bounds as balanced BSTs with a probabilistic guarantee, a simpler implementation, and natural support for range queries (just walk the bottom-level linked list).
The Algorithm
Structure

A skip list is a hierarchy of linked lists. The bottom level (level 0) is a standard sorted linked list containing all elements. Each higher level is a "fast lane" containing a random subset of the elements below it.
Level 3: HEAD ───────────────────────────── 50 ─────────────────── NIL
Level 2: HEAD ──────── 20 ────────────────── 50 ──── 70 ────────── NIL
Level 1: HEAD ── 10 ── 20 ──── 30 ────────── 50 ──── 70 ── 80 ── NIL
Level 0: HEAD ── 10 ── 20 ── 25 ── 30 ── 40 ── 50 ── 60 ── 70 ── 80 ── 90 ── NIL
Each node has a random height. A node at height h has forward pointers at levels 0 through h-1.
Level Assignment: Coin Flip

When inserting a new element, determine its height by flipping a biased coin:
With p = 0.5: 50% of nodes are height 1, 25% are height 2, 12.5% are height 3, etc. This gives an expected O(log n) levels and O(n) total space.
With p = 0.25 (used by Redis): 75% of nodes are height 1. Fewer levels, slightly worse search constant but better space usage. Redis uses MAX_LEVEL = 32.
Search
def search(target):
current = HEAD
for level in range(max_level - 1, -1, -1): # top to bottom
while current.forward[level] is not None
and current.forward[level].key < target:
current = current.forward[level] # move right
current = current.forward[0] # drop to bottom
if current is not None and current.key == target:
return current
return None
Worked Example: Search for 40
Level 3: HEAD ───────────────────────────── 50
Start here. 50 > 40, can't go right. Drop down.
Level 2: HEAD ──────── 20 ────────────────── 50
20 < 40, move right to 20.
50 > 40, can't go right. Drop down.
Level 1: HEAD ── 10 ── 20 ──── 30 ────────── 50
At 20, move right to 30.
30 < 40, move right.
50 > 40, can't go right. Drop down.
Level 0: ... 30 ── 40 ── 50 ...
At 30, move right to 40. Found!
Total comparisons: 5. In a flat linked list: 5 (walk from start).
For n = 1000, the difference is ~10 vs 1000.
Insert
def insert(key, value):
# 1. Find the position at each level (save predecessors)
update = [None] * MAX_LEVEL
current = HEAD
for level in range(max_level - 1, -1, -1):
while current.forward[level] and current.forward[level].key < key:
current = current.forward[level]
update[level] = current
# 2. Determine random level for new node
new_level = random_level()
# 3. Create new node and splice into each level
new_node = Node(key, value, new_level)
for level in range(new_level):
new_node.forward[level] = update[level].forward[level]
update[level].forward[level] = new_node
Delete
def delete(key):
update = [None] * MAX_LEVEL
current = HEAD
for level in range(max_level - 1, -1, -1):
while current.forward[level] and current.forward[level].key < key:
current = current.forward[level]
update[level] = current
target = current.forward[0]
if target and target.key == key:
for level in range(target.level):
update[level].forward[level] = target.forward[level]
No rotations. No rebalancing. No color flipping. Just pointer updates, like deleting from a linked list.
Range Queries
This is where skip lists truly shine. To get all elements in [lo, hi]:
def range_query(lo, hi):
node = search(lo) # O(log n) to find start
results = []
while node and node.key <= hi:
results.append(node)
node = node.forward[0] # O(1) per element
return results
Total: O(log n + k) where k is the number of results. In a balanced BST, range queries require in-order traversal which is cache-unfriendly due to pointer chasing between non-adjacent tree nodes. In a skip list, the bottom-level linked list has sequential access -- far more cache-friendly.
Proof/Correctness Intuition
Why O(log n) Expected Search Time
At each level, you skip over a geometrically distributed number of nodes. With promotion probability p = 0.5:
- Level 0: all n nodes
- Level 1: ~n/2 nodes
- Level 2: ~n/4 nodes
- Level k: ~n/2^k nodes
- Expected max level: log_2(n)
When searching, you start at the top level and move right until you overshoot, then drop down. At each level, the expected number of nodes examined is 1/p = 2 (for p=0.5). Total expected comparisons: (1/p) * log_{1/p}(n) = 2 * log_2(n) = O(log n).
The key insight: the randomized level assignment produces a structure that is balanced in expectation without any explicit balancing operations. The probability that a skip list degenerates to a linked list (all nodes at level 1) is exponentially small.
Comparison with BSTs
A balanced BST maintains balance through rotations, which are complex and require careful implementation. A skip list maintains balance through randomization, which is simple and requires only a random number generator. The expected performance is the same: O(log n) for all operations.
The worst case differs: a BST guarantees O(log n) worst case (if balanced), while a skip list has O(n) worst case (if all coin flips come up the same). In practice, the worst case for skip lists is astronomically unlikely -- comparable to a hash table having all elements in one bucket.
Real-World Usage
| System | Where Skip Lists Are Used | Why |
|---|---|---|
| Redis | Sorted sets (ZSET) | Range queries, rank queries, simplicity |
| LevelDB | MemTable (in-memory write buffer) | Concurrent writes, simple implementation |
| RocksDB | MemTable (default implementation) | Same as LevelDB, with concurrent skip list |
| Lucene | In-memory term dictionary during indexing | Fast ordered insertion |
| MemSQL/SingleStore | Lock-free skip list for in-memory indexes | Concurrent access without locks |
Why Redis Chose Skip Lists Over Red-Black Trees
Antirez (Salvatore Sanfilippo, Redis creator) explained the decision directly:
- Simpler to implement: Skip lists are conceptually simpler and have fewer edge cases than red-black trees. Redis values code simplicity.
- Range queries:
ZRANGEBYSCOREandZRANGEBYLEXnaturally traverse the bottom-level list. In a red-black tree, range queries require in-order traversal with stack management. - O(log n) rank: Redis augments skip list nodes with a
spanfield (how many level-0 nodes each forward pointer skips). This enables O(log n)ZRANKoperations. - Comparable performance: For Redis's typical workload, skip lists and red-black trees have similar real-world performance. The theoretical constant factor difference is irrelevant compared to network latency.
- Lock-free variants exist: If Redis ever needed concurrent access (for multi-threaded operations), lock-free skip lists are well-studied. Lock-free red-black trees are a research problem.
Skip Lists vs B-Trees
| Aspect | Skip List | B-Tree |
|---|---|---|
| On disk | Poor (random pointers) | Excellent (block-aligned nodes) |
| In memory | Good | Good |
| Range queries | Excellent (linked list walk) | Excellent (leaf-level scan) |
| Concurrency | Lock-free variants exist | Latch coupling (complex) |
| Implementation | Simple | Complex (splits, merges, redistribution) |
| Cache behavior | Moderate (pointer chasing) | Excellent (large, contiguous nodes) |
B-trees win on disk because they minimize random I/O. Skip lists win in memory when implementation simplicity and concurrent access matter.
Interview Application
When to mention skip lists:
- "How does Redis implement sorted sets?" -- Skip list with span metadata for O(log n) rank queries.
- "What data structure would you use for an in-memory ordered index?" -- Skip list if simplicity matters, B-tree if cache performance is critical.
- "How would you implement a concurrent ordered map?" -- Lock-free skip list (Java's
ConcurrentSkipListMap). - "Design a leaderboard system" -- Skip list for real-time rank queries.
What interviewers want to hear:
- You understand the probabilistic level assignment and why it gives O(log n).
- You know skip lists are simpler than balanced BSTs and naturally support range queries.
- You can explain why Redis chose skip lists (simplicity + range queries + rank queries).
- You know the trade-off: skip lists are not great on disk (use B-trees there).
Trade-offs
| Advantage | Disadvantage |
|---|---|
| Simple implementation | O(n) worst case (extremely unlikely) |
| Natural range queries | More memory than BST (forward pointers) |
| Lock-free concurrent variants | Not disk-friendly (random pointers) |
| No rebalancing needed | Cache behavior worse than B-tree |
| Easy to augment (span, backward ptrs) | Randomized -- not deterministic |
| O(log n) insert without rotations | Constant factors slightly worse than BST |
Memory Overhead
Each node stores level forward pointers. With p = 0.5, the expected number of pointers per node is 1/(1-p) = 2. So total pointer overhead is ~2 pointers per node, similar to a binary tree's left/right pointers. With p = 0.25 (Redis), the expected overhead is 1.33 pointers per node -- less than a binary tree.
Common Mistakes
Skip lists are always better than balanced BSTs
No. Balanced BSTs have deterministic O(log n) worst case. Skip lists have probabilistic O(log n). For safety-critical systems where worst-case guarantees matter, a balanced BST is the right choice. For most practical systems, the probabilistic guarantee is sufficient.
Skip lists use more memory than BSTs
With p = 0.25, a skip list uses 1.33 pointers per node on average. A binary tree uses 2 pointers per node (left, right). A red-black tree adds a color bit. Memory usage is comparable or better for skip lists.
Skip lists are only useful for Redis
Skip lists are used in LevelDB, RocksDB, Lucene, and Java's ConcurrentSkipListMap. They are a general-purpose ordered data structure, not a Redis-specific optimization.
The coin flip must be fair (p = 0.5)
The promotion probability p is a tuning parameter. Redis uses p = 0.25 for lower memory overhead. p = 0.5 gives faster search (fewer nodes per level traversal). The optimal p depends on the workload.
Skip lists cannot support rank queries efficiently
Standard skip lists do not, but augmented skip lists do. Redis adds a span field to each forward pointer: the number of level-0 nodes that pointer skips over. The rank of an element is the sum of spans along the search path. This is O(log n), same as an order-statistic tree.
You should use skip lists for on-disk storage
No. Skip lists have random pointer access patterns that cause excessive disk seeks. For on-disk ordered storage, B-trees (or LSM trees) are the right choice. Skip lists are an in-memory data structure.