Binary Search in RLE List
Level: L4-L5 Topics: Binary Search, Run-Length Encoding, Data Structures
Problem Statement
Implement a List class that stores data internally using run-length encoding (RLE). The key operation is:
The underlying storage is a sequence of (value, count) pairs representing consecutive runs of the same value.
Background & Constraints
- The logical list can be very large (billions of elements), but the number of distinct runs is much smaller.
- The RLE representation is stored as an array of
(value, count)pairs in order. positionis a 0-based index into the logical (uncompressed) list.- You should not decompress the list — the whole point is to work with the compressed representation.
- The list is immutable after construction.
Examples
RLE representation:
Logical list:
Queries:
find(0) → 3 (in first run)
find(3) → 3 (last element of first run)
find(4) → 7 (first element of second run)
find(5) → 7 (last element of second run)
find(6) → 1 (first element of third run)
find(10) → 1 (last element of third run)
find(13) → 9 (last element of fourth run)
Hints & Common Pitfalls
-
Precompute cumulative lengths: Build a prefix sum array of run lengths. For the example above:
[4, 6, 11, 14]. The cumulative array tells you that positions 0-3 are in run 0, positions 4-5 in run 1, etc. -
Binary search on cumulative lengths: To find which run contains
position, binary search the cumulative array for the smallest cumulative value that is greater thanposition. This gives O(log R) time where R is the number of runs. -
Alternative: Linear scan through runs works but is O(R) per query. For many queries on a large RLE list, binary search is essential.
-
Common mistake: Off-by-one errors in the binary search. If cumulative lengths are
[4, 6, 11, 14]and you query position 4, you need to find the first cumulative value > 4, which is 6 at index 1. -
Edge cases: Position 0, position equal to total length - 1, position beyond the list (error).
Follow-Up Questions
-
Range lookup with sorted values: Suppose the values in the RLE list are sorted (non-decreasing). Given a target value, find the range of positions
[start, end]that contain that value. How can you do this efficiently using binary search on both the values and the cumulative lengths? -
Dynamic updates: If you need to support inserting or deleting elements at arbitrary positions while maintaining RLE, what data structure would you use? Consider a balanced BST of runs.
-
Two-dimensional RLE: How would you extend this to a 2D grid where rows are RLE-encoded? How would you efficiently query the value at position
(row, col)?