Skip to content

Block Compression

When the Array Has Too Much Detail

Lessons 1 and 2 built prefix and suffix arrays over the raw input. That works when each element matters independently. But when an array is massive and filled with long runs of elements that behave the same way, treating every single index as a unique decision is exhausting — and unnecessary. The interesting decisions happen at the boundaries between runs.

Instead of asking "what do I do at each index?", ask: "can I compress contiguous segments into summary blocks and sweep over the blocks instead?"

That is block compression. You replace \(N\) raw elements with \(M\) macro-blocks (where \(M\) is often much smaller), then apply a greedy sweep over the compressed representation. The sweep itself is usually a single pass tracking a running surplus or deficit.

Block compression: raw array compressed into (deficit, gain) macro-blocks

The diagram shows an array with alternating runs of "demand" (negative) and "supply" (positive) elements. Each run compresses into a single block labeled with its total deficit and total gain. The sweep processes blocks left to right, checking whether the running surplus ever drops below zero.


The Gas Station Problem

Problem (LeetCode 134). There are \(N\) gas stations arranged in a circle. Station \(i\) has gasAmounts[i] fuel and costs travelCosts[i] fuel to reach station \(i+1\). Find the starting station from which you can complete the full circuit, or report that no solution exists.

The Greedy Choice: Scan character by character, tracking a running fuel surplus. When the tank goes negative, reset the starting station to the next position.

The Naive Approach

The brute force: try each of the \(N\) starting stations, simulate the full trip in \(O(N)\), total \(O(N^2)\). You might try to be clever with prefix sums, but the circular nature makes it awkward — the prefix "wraps around." People try to track the "worst point" in a single linear pass, but they do not account for the circularity correctly. Off-by-one errors everywhere.

The Fix: Compress, Then Sweep

Step 1: Compute net fuel at each station.

\[\text{net}[i] = \text{gasAmounts}[i] - \text{travelCosts}[i]\]

If \(\text{net}[i] > 0\), station \(i\) is a "supplier" (you gain fuel). If \(\text{net}[i] < 0\), station \(i\) is a "consumer" (you lose fuel).

Step 2: Think in blocks.

We are conceptually grouping consecutive negative stations into a Deficit Block and consecutive positive stations into a Gain Block. Each block summarizes its segment:

  • A deficit block has total demand \(= \sum |\text{net}[i]|\) for consecutive negative stations
  • A gain block has total supply \(= \sum \text{net}[i]\) for consecutive positive stations

The key realization: within a gain block, the order does not matter — you just accumulate fuel. Within a deficit block, the order does not matter either — you just burn fuel. The interesting decisions are at the block boundaries: do you have enough surplus to survive the next deficit block?

Step 3: Find the optimal start via surplus sweep.

Starting from each block boundary, simulate the trip by maintaining a running surplus. The optimal starting point is where the surplus reaches its global minimum — starting after that point guarantees the surplus never goes negative during the circuit.

Full State Trace

Let's trace the exact fuel states.

Input: gasAmounts \(= [1, 2, 3, 4, 5]\), travelCosts \(= [3, 4, 5, 1, 2]\)

Net fuel:

Station gas cost net Block
0 1 3 \(-2\) B0 (deficit)
1 2 4 \(-2\) B0 (deficit)
2 3 5 \(-2\) B0 (deficit)
3 4 1 \(+3\) B1 (gain)
4 5 2 \(+3\) B1 (gain)

Compressed blocks:

Block Stations Type Total
B0 0, 1, 2 Deficit \(-6\)
B1 3, 4 Gain \(+6\)

Total net \(= 0\), so a solution exists.

Surplus sweep (trying each block as start):

Start block Surplus after B0 Surplus after B1 Min surplus during trip
B0 \(-6\) \(0\) \(-6\) (fails — went negative)
B1 \(+6\) \(0\) \(0\) (never goes negative)

Answer: station 3 (start of block B1).

Try It

Before reading the code, think about this: if the total net fuel across all stations is negative, can any starting point work? Why not? And if the total is exactly zero or positive, why is exactly one valid starting block guaranteed?

The Mental Model vs The Code

While our mental model explicitly builds deficit and gain blocks, our code can optimize this to \(O(1)\) space by processing the blocks on the fly. Instead of materializing the blocks, we track currentTank and reset it to zero whenever it goes negative. Each reset implicitly ends a deficit block and starts scanning for the next gain block.

int canCompleteCircuit(vector<int>& gasAmounts, vector<int>& travelCosts) {
    int numStations = gasAmounts.size();
    int totalNetGas = 0;
    int currentTank = 0;
    int startingStation = 0;

    for (int i = 0; i < numStations; i++) {
        int netGas = gasAmounts[i] - travelCosts[i];
        totalNetGas += netGas;
        currentTank += netGas;

        if (currentTank < 0) {
            startingStation = i + 1;
            currentTank = 0;
        }
    }

    return totalNetGas >= 0 ? startingStation : -1;
}

The if (currentTank < 0) check is where block compression happens implicitly. Each reset marks the end of a deficit block. The startingStation pointer jumps past it.


Minimum Number of Refueling Stops

Problem (LeetCode 871). A car starts with initialFuel fuel. It needs to travel targetDistance miles. Along the way there are gas stations at positions stationLocations[i] with stationLocations[i][1] gallons. The car uses 1 gallon per mile. What is the minimum number of stops to reach the target?

Why Brute Force Fails

This looks like a DP problem. You might try dp[i] = min stops to reach station \(i\). But the state space is large and the transitions are not obvious — at each station you choose to stop or skip, and the "skip" decision depends on how much fuel you have, which depends on all previous decisions.

Instead of asking "which stations should I stop at?", ask: "when I run out of fuel, which past station do I wish I had stopped at?" The answer is always the one with the most fuel. This is a regret-based greedy — the same philosophy as the undo trick from Chapter 5, applied to the physical distance blocks between stations.

Block Compression View

Here, our "blocks" aren't consecutive elements of the same sign — they're the physical distances between stations. Each block is a gap that the car must cross:

  • Each block is the gap between two consecutive stations (or start/end)
  • Deficit of a block \(=\) how much more fuel you need beyond what you currently have
  • Gain \(=\) fuel available at the station before this block

The greedy insight: always use the largest available fuel stop when you need it. This is a max-heap of fuels from passed stations.

Full State Trace

Input: targetDistance \(= 100\), initialFuel \(= 10\), stations \(= [(10, 60), (20, 30), (60, 40)]\)

Blocks (gaps between stations):

Block From To Distance Station fuel at start
B0 0 10 10 — (start, have 10)
B1 10 20 10 60 (station 0)
B2 20 60 40 30 (station 1)
B3 60 100 40 40 (station 2)

Greedy sweep with max-heap of passed stations:

Block Fuel before Enough? Heap state Action Stops Fuel after
B0 10 Yes (\(10 \geq 10\)) {} Cross directly 0 0
B1 0 No (\(0 < 10\)) {60} Pop 60. Now have 60. Cross. 1 50
B2 50 Yes (\(50 \geq 40\)) {30} Cross directly 1 10
B3 10 No (\(10 < 40\)) {40, 30} Pop 40. Now have 50. Cross. 2 10

Answer: 2 stops.

Try It

What if initialFuel \(= 70\)? Trace through the blocks again. You should find you need only 1 stop. Which station does the greedy pick, and why is it provably optimal? (Hint: exchange argument — if any other single stop could also work, it would give at most the same fuel as the one the heap picks.)

Code

int minRefuelStops(int targetDistance, int initialFuel, vector<vector<int>>& stationLocations) {
    auto maxHeapCompare = [](int a, int b) {
        return a < b;
    };
    priority_queue<int, vector<int>, decltype(maxHeapCompare)> availableFuelHeap(maxHeapCompare);

    int currentFuel = initialFuel;
    int totalStops = 0;
    int previousLocation = 0;

    stationLocations.push_back({targetDistance, 0});

    for (const auto& station : stationLocations) {
        int distanceToCross = station[0] - previousLocation;
        currentFuel -= distanceToCross;

        while (currentFuel < 0 && !availableFuelHeap.empty()) {
            currentFuel += availableFuelHeap.top();
            availableFuelHeap.pop();
            totalStops++;
        }

        if (currentFuel < 0) return -1;

        availableFuelHeap.push(station[1]);
        previousLocation = station[0];
    }

    return totalStops;
}

The General Block Compression Pattern

Here is the blueprint that ties both problems together:

1. COMPRESS: Group contiguous elements into macro-blocks
   Each block summarizes: what it demands (deficit) and what it provides (gain)

2. SWEEP: Process blocks left to right with a running state
   - Accumulate gains, subtract deficits
   - When deficit exceeds surplus, take corrective action
     (refuel from heap, reset start pointer, etc.)

3. ANSWER: The sweep state at the end gives the answer
   (total stops, feasibility, optimal start, etc.)

What changes per problem:

Component Gas Station Refueling Stops Job Scheduling
Block definition Consecutive same-sign net fuel Gap between stations Task with deadline + penalty
Deficit Fuel consumed Distance to cross Penalty if missed
Gain Fuel added Fuel available Reward if completed
Corrective action Reset start Pop from max-heap Pop from min-heap

The shape is the same as Lessons 1-2, but instead of sweeping a split point between two arrays, you sweep a single running state across compressed blocks.


When to Recognize Block Compression

Watch for these signals:

  1. Long runs of similar behavior. The array has stretches where elements are all positive, all negative, or all "the same kind." The interesting decisions are at the transitions.

  2. Circular sweep. The problem wraps around (gas stations, circular buffers). Block compression + finding the optimal start is the standard approach.

  3. "Greedy with regret" over segments. You process segments in order but keep a heap of past options you can retroactively use. The heap acts as a deferred-decision buffer — exactly the undo trick from Chapter 5, applied at the block level.

  4. The deficit/surplus framing. If you can describe each portion of the input as "it needs \(X\) and provides \(Y\)," you are looking at block compression.

Challenge

Harder variant: what if the gas stations are on a line (not a circle), but you can also drive backward? How does the block compression change? Think about what the "blocks" represent when direction is a choice, and whether you need two sets of blocks (one per direction) merged with a decoupled sweep from Lesson 1.


Complexity

Step Time Space
Compression \(O(N)\) \(O(M)\) where \(M\) = number of blocks
Sweep \(O(M)\) or \(O(M \log M)\) with heap \(O(M)\)
Total \(O(N)\) or \(O(N \log N)\) \(O(N)\)

The compression itself is always linear. The sweep cost depends on whether you use a heap for corrective actions (\(O(\log M)\) per action) or a simple running variable (\(O(1)\) per step).


Try It

Problem: Minimum Platforms (Classic). Given arrival and departure times of \(N\) trains, find the minimum number of platforms needed.

Reframe this as block compression: sort all events by time. Each "block" between consecutive events either adds a train (demand \(+1\)) or removes one (demand \(-1\)). Sweep the blocks tracking peak demand. This is the compression view of the standard sweep-line algorithm.

Reasoning prompt: Why is this block compression rather than a decoupled sweep? What makes it fundamentally different from Lessons 1-2? Think about whether the problem decomposes into left-half vs right-half, or whether it requires a single running state across the whole input.


Problems

Problem Link Difficulty
LeetCode 134 — Gas Station leetcode.com/problems/gas-station Medium
LeetCode 871 — Minimum Number of Refueling Stops leetcode.com/problems/minimum-number-of-refueling-stops Hard
LeetCode 1094 — Car Pooling leetcode.com/problems/car-pooling Medium
Minimum Platforms GeeksforGeeks Medium
LC 42 — Trapping Rain Water leetcode.com/problems/trapping-rain-water Hard
LC 135 — Candy leetcode.com/problems/candy Hard
LC 689 — Max Sum of 3 Non-Overlapping Subarrays leetcode.com/problems/maximum-sum-of-3-non-overlapping-subarrays Hard