Skip to content

Binary Search in RLE List

Level: L4-L5 Topics: Binary Search, Run-Length Encoding, Data Structures

Problem Statement

Implement a List class that stores data internally using run-length encoding (RLE). The key operation is:

int find(int position)
    // Returns the value at the given position in the logical list.

The underlying storage is a sequence of (value, count) pairs representing consecutive runs of the same value.

Background & Constraints

  • The logical list can be very large (billions of elements), but the number of distinct runs is much smaller.
  • The RLE representation is stored as an array of (value, count) pairs in order.
  • position is a 0-based index into the logical (uncompressed) list.
  • You should not decompress the list — the whole point is to work with the compressed representation.
  • The list is immutable after construction.

Examples

RLE representation:

[(3, 4), (7, 2), (1, 5), (9, 3)]

Logical list:

[3, 3, 3, 3, 7, 7, 1, 1, 1, 1, 1, 9, 9, 9]
 0  1  2  3  4  5  6  7  8  9  10 11 12 13

Queries:

find(0)  → 3   (in first run)
find(3)  → 3   (last element of first run)
find(4)  → 7   (first element of second run)
find(5)  → 7   (last element of second run)
find(6)  → 1   (first element of third run)
find(10) → 1   (last element of third run)
find(13) → 9   (last element of fourth run)

Hints & Common Pitfalls

  • Precompute cumulative lengths: Build a prefix sum array of run lengths. For the example above: [4, 6, 11, 14]. The cumulative array tells you that positions 0-3 are in run 0, positions 4-5 in run 1, etc.

  • Binary search on cumulative lengths: To find which run contains position, binary search the cumulative array for the smallest cumulative value that is greater than position. This gives O(log R) time where R is the number of runs.

  • Alternative: Linear scan through runs works but is O(R) per query. For many queries on a large RLE list, binary search is essential.

  • Common mistake: Off-by-one errors in the binary search. If cumulative lengths are [4, 6, 11, 14] and you query position 4, you need to find the first cumulative value > 4, which is 6 at index 1.

  • Edge cases: Position 0, position equal to total length - 1, position beyond the list (error).

Follow-Up Questions

  1. Range lookup with sorted values: Suppose the values in the RLE list are sorted (non-decreasing). Given a target value, find the range of positions [start, end] that contain that value. How can you do this efficiently using binary search on both the values and the cumulative lengths?

  2. Dynamic updates: If you need to support inserting or deleting elements at arbitrary positions while maintaining RLE, what data structure would you use? Consider a balanced BST of runs.

  3. Two-dimensional RLE: How would you extend this to a 2D grid where rows are RLE-encoded? How would you efficiently query the value at position (row, col)?