Skip Lists — The Engine Inside Redis Sorted Sets

TL;DR

A skip list is a probabilistic data structure that provides O(log n) search, insert, and delete -- like a balanced BST -- but is dramatically simpler to implement. Each element is assigned a random height via coin flips, creating express lanes above the base linked list. Redis uses skip lists for sorted sets (ZRANGEBYSCORE, ZRANK) because they are simpler than red-black trees, naturally support range queries, and have cache-friendly sequential access. LevelDB and RocksDB use skip lists for their in-memory write buffer (MemTable). If a balanced BST feels like overkill and a linked list feels too slow, a skip list is the answer.

The Problem

You need an ordered data structure that supports:

Insert in O(log n)
Delete in O(log n)
Search in O(log n)
Range queries: "give me all elements between score 100 and 200"
Rank queries: "what is the rank of this element?"

A sorted array gives you O(log n) search but O(n) insert. A linked list gives you O(1) insert at a known position but O(n) search. A balanced BST (red-black tree, AVL tree) gives you O(log n) for everything but is complex to implement, hard to debug, and makes range queries awkward (in-order traversal with callbacks).

Skip lists achieve the same O(log n) bounds as balanced BSTs with a probabilistic guarantee, a simpler implementation, and natural support for range queries (just walk the bottom-level linked list).

The Algorithm

Structure

Skip List Multi-Level Structure

A skip list is a hierarchy of linked lists. The bottom level (level 0) is a standard sorted linked list containing all elements. Each higher level is a "fast lane" containing a random subset of the elements below it.

Level 3:  HEAD ───────────────────────────── 50 ─────────────────── NIL
Level 2:  HEAD ──────── 20 ────────────────── 50 ──── 70 ────────── NIL
Level 1:  HEAD ── 10 ── 20 ──── 30 ────────── 50 ──── 70 ── 80 ── NIL
Level 0:  HEAD ── 10 ── 20 ── 25 ── 30 ── 40 ── 50 ── 60 ── 70 ── 80 ── 90 ── NIL

Each node has a random height. A node at height h has forward pointers at levels 0 through h-1.

Level Assignment: Coin Flip

Skip List Level Probability Distribution

When inserting a new element, determine its height by flipping a biased coin:

level = 1
while random() < p and level < MAX_LEVEL:
    level += 1
return level

With p = 0.5: 50% of nodes are height 1, 25% are height 2, 12.5% are height 3, etc. This gives an expected O(log n) levels and O(n) total space.

With p = 0.25 (used by Redis): 75% of nodes are height 1. Fewer levels, slightly worse search constant but better space usage. Redis uses MAX_LEVEL = 32.

Search

def search(target):
    current = HEAD
    for level in range(max_level - 1, -1, -1):   # top to bottom
        while current.forward[level] is not None
              and current.forward[level].key < target:
            current = current.forward[level]       # move right
    current = current.forward[0]                   # drop to bottom
    if current is not None and current.key == target:
        return current
    return None

Worked Example: Search for 40

Level 3:  HEAD ───────────────────────────── 50
          Start here. 50 > 40, can't go right. Drop down.

Level 2:  HEAD ──────── 20 ────────────────── 50
          20 < 40, move right to 20.
          50 > 40, can't go right. Drop down.

Level 1:  HEAD ── 10 ── 20 ──── 30 ────────── 50
          At 20, move right to 30.
          30 < 40, move right.
          50 > 40, can't go right. Drop down.

Level 0:  ... 30 ── 40 ── 50 ...
          At 30, move right to 40. Found!

Total comparisons: 5. In a flat linked list: 5 (walk from start).
For n = 1000, the difference is ~10 vs 1000.

Insert

def insert(key, value):
    # 1. Find the position at each level (save predecessors)
    update = [None] * MAX_LEVEL
    current = HEAD
    for level in range(max_level - 1, -1, -1):
        while current.forward[level] and current.forward[level].key < key:
            current = current.forward[level]
        update[level] = current

    # 2. Determine random level for new node
    new_level = random_level()

    # 3. Create new node and splice into each level
    new_node = Node(key, value, new_level)
    for level in range(new_level):
        new_node.forward[level] = update[level].forward[level]
        update[level].forward[level] = new_node

Delete

def delete(key):
    update = [None] * MAX_LEVEL
    current = HEAD
    for level in range(max_level - 1, -1, -1):
        while current.forward[level] and current.forward[level].key < key:
            current = current.forward[level]
        update[level] = current

    target = current.forward[0]
    if target and target.key == key:
        for level in range(target.level):
            update[level].forward[level] = target.forward[level]

No rotations. No rebalancing. No color flipping. Just pointer updates, like deleting from a linked list.

Range Queries

This is where skip lists truly shine. To get all elements in [lo, hi]:

def range_query(lo, hi):
    node = search(lo)  # O(log n) to find start
    results = []
    while node and node.key <= hi:
        results.append(node)
        node = node.forward[0]  # O(1) per element
    return results

Total: O(log n + k) where k is the number of results. In a balanced BST, range queries require in-order traversal which is cache-unfriendly due to pointer chasing between non-adjacent tree nodes. In a skip list, the bottom-level linked list has sequential access -- far more cache-friendly.

Proof/Correctness Intuition

Why O(log n) Expected Search Time

At each level, you skip over a geometrically distributed number of nodes. With promotion probability p = 0.5:

Level 0: all n nodes
Level 1: ~n/2 nodes
Level 2: ~n/4 nodes
Level k: ~n/2^k nodes
Expected max level: log_2(n)

When searching, you start at the top level and move right until you overshoot, then drop down. At each level, the expected number of nodes examined is 1/p = 2 (for p=0.5). Total expected comparisons: (1/p) * log_{1/p}(n) = 2 * log_2(n) = O(log n).

The key insight: the randomized level assignment produces a structure that is balanced in expectation without any explicit balancing operations. The probability that a skip list degenerates to a linked list (all nodes at level 1) is exponentially small.

Comparison with BSTs

A balanced BST maintains balance through rotations, which are complex and require careful implementation. A skip list maintains balance through randomization, which is simple and requires only a random number generator. The expected performance is the same: O(log n) for all operations.

The worst case differs: a BST guarantees O(log n) worst case (if balanced), while a skip list has O(n) worst case (if all coin flips come up the same). In practice, the worst case for skip lists is astronomically unlikely -- comparable to a hash table having all elements in one bucket.

Real-World Usage

System	Where Skip Lists Are Used	Why
Redis	Sorted sets (ZSET)	Range queries, rank queries, simplicity
LevelDB	MemTable (in-memory write buffer)	Concurrent writes, simple implementation
RocksDB	MemTable (default implementation)	Same as LevelDB, with concurrent skip list
Lucene	In-memory term dictionary during indexing	Fast ordered insertion
MemSQL/SingleStore	Lock-free skip list for in-memory indexes	Concurrent access without locks

Why Redis Chose Skip Lists Over Red-Black Trees

Antirez (Salvatore Sanfilippo, Redis creator) explained the decision directly:

Simpler to implement: Skip lists are conceptually simpler and have fewer edge cases than red-black trees. Redis values code simplicity.
Range queries: ZRANGEBYSCORE and ZRANGEBYLEX naturally traverse the bottom-level list. In a red-black tree, range queries require in-order traversal with stack management.
O(log n) rank: Redis augments skip list nodes with a span field (how many level-0 nodes each forward pointer skips). This enables O(log n) ZRANK operations.
Comparable performance: For Redis's typical workload, skip lists and red-black trees have similar real-world performance. The theoretical constant factor difference is irrelevant compared to network latency.
Lock-free variants exist: If Redis ever needed concurrent access (for multi-threaded operations), lock-free skip lists are well-studied. Lock-free red-black trees are a research problem.

Skip Lists vs B-Trees

Aspect	Skip List	B-Tree
On disk	Poor (random pointers)	Excellent (block-aligned nodes)
In memory	Good	Good
Range queries	Excellent (linked list walk)	Excellent (leaf-level scan)
Concurrency	Lock-free variants exist	Latch coupling (complex)
Implementation	Simple	Complex (splits, merges, redistribution)
Cache behavior	Moderate (pointer chasing)	Excellent (large, contiguous nodes)

B-trees win on disk because they minimize random I/O. Skip lists win in memory when implementation simplicity and concurrent access matter.

Interview Application

When to mention skip lists:

"How does Redis implement sorted sets?" -- Skip list with span metadata for O(log n) rank queries.
"What data structure would you use for an in-memory ordered index?" -- Skip list if simplicity matters, B-tree if cache performance is critical.
"How would you implement a concurrent ordered map?" -- Lock-free skip list (Java's ConcurrentSkipListMap).
"Design a leaderboard system" -- Skip list for real-time rank queries.

What interviewers want to hear:

You understand the probabilistic level assignment and why it gives O(log n).
You know skip lists are simpler than balanced BSTs and naturally support range queries.
You can explain why Redis chose skip lists (simplicity + range queries + rank queries).
You know the trade-off: skip lists are not great on disk (use B-trees there).

Trade-offs

Advantage	Disadvantage
Simple implementation	O(n) worst case (extremely unlikely)
Natural range queries	More memory than BST (forward pointers)
Lock-free concurrent variants	Not disk-friendly (random pointers)
No rebalancing needed	Cache behavior worse than B-tree
Easy to augment (span, backward ptrs)	Randomized -- not deterministic
O(log n) insert without rotations	Constant factors slightly worse than BST

Memory Overhead

Each node stores level forward pointers. With p = 0.5, the expected number of pointers per node is 1/(1-p) = 2. So total pointer overhead is ~2 pointers per node, similar to a binary tree's left/right pointers. With p = 0.25 (Redis), the expected overhead is 1.33 pointers per node -- less than a binary tree.

Common Mistakes

Skip lists are always better than balanced BSTs

No. Balanced BSTs have deterministic O(log n) worst case. Skip lists have probabilistic O(log n). For safety-critical systems where worst-case guarantees matter, a balanced BST is the right choice. For most practical systems, the probabilistic guarantee is sufficient.

Skip lists use more memory than BSTs

With p = 0.25, a skip list uses 1.33 pointers per node on average. A binary tree uses 2 pointers per node (left, right). A red-black tree adds a color bit. Memory usage is comparable or better for skip lists.

Skip lists are only useful for Redis

Skip lists are used in LevelDB, RocksDB, Lucene, and Java's ConcurrentSkipListMap. They are a general-purpose ordered data structure, not a Redis-specific optimization.

The coin flip must be fair (p = 0.5)

The promotion probability p is a tuning parameter. Redis uses p = 0.25 for lower memory overhead. p = 0.5 gives faster search (fewer nodes per level traversal). The optimal p depends on the workload.

Skip lists cannot support rank queries efficiently

Standard skip lists do not, but augmented skip lists do. Redis adds a span field to each forward pointer: the number of level-0 nodes that pointer skips over. The rank of an element is the sum of spans along the search path. This is O(log n), same as an order-statistic tree.

You should use skip lists for on-disk storage

No. Skip lists have random pointer access patterns that cause excessive disk seeks. For on-disk ordered storage, B-trees (or LSM trees) are the right choice. Skip lists are an in-memory data structure.