Skip to content

Greedy Tree Traversals

Trees and Greedy: A Natural Pairing

Trees are the most greedy-friendly structure in all of algorithms. No cycles means every pair of nodes has a unique path. Subtrees are independent once you fix the root. These properties make greedy algorithms surprisingly powerful — often more powerful than you'd expect.

This lesson covers three themes: optimizing tree traversal costs, Huffman coding as the canonical tree greedy, and the general question of when greedy beats DP on trees.


Theme 1: Optimal Tree Traversal

The Problem: Minimizing Travel Cost on a Tree

You start at a root node and must visit every node in a tree, then return to the root. Each edge has a weight. What is the minimum total travel cost?

Observation: In any complete traversal that returns to the root, every edge is traversed exactly twice (once down, once up) — this is the Euler tour. The total cost of a full Euler tour is \(2 \times (\text{sum of all edge weights})\). There's no way around this.

But what if you don't need to return to the root? Then you can save the cost of the longest path from root to any leaf. You walk down that path last and simply stop.

\[\text{Minimum traversal cost (no return)} = 2 \times \text{totalEdgeWeight} - \text{longestRootToLeafPath}\]

The Greedy Choice: Among all possible "leave-open" paths, always choose the longest one. Any other choice leaves a shorter open path and wastes more on backtracking.

Why Order Doesn't Matter

The instinct is to think about which order to visit the children. Should you visit the heaviest subtree first? The lightest? It doesn't matter — the Euler tour cost is the same regardless of visit order. The only degree of freedom is which single path to leave open at the end.

Computing the Longest Path

A single DFS from the root computes the longest root-to-leaf distance:

vector<vector<pair<int, int>>> adjacencyList(numNodes);
vector<long long> depthFromRoot(numNodes, 0);

function<void(int, int)> computeDepths = [&](int currentNode, int parentNode) {
    for (auto [neighbor, edgeWeight] : adjacencyList[currentNode]) {
        if (neighbor == parentNode) continue;
        depthFromRoot[neighbor] = depthFromRoot[currentNode] + edgeWeight;
        computeDepths(neighbor, currentNode);
    }
};

computeDepths(0, -1);
long long longestPath = *max_element(depthFromRoot.begin(), depthFromRoot.end());
long long totalEdgeWeight = 0;
for (int u = 0; u < numNodes; u++)
    for (auto [v, w] : adjacencyList[u]) totalEdgeWeight += w;
totalEdgeWeight /= 2;

cout << 2 * totalEdgeWeight - longestPath << "\n";

Variant: Free Start and End Points

A harder variant: you can start at any node, visit all nodes, and end at any node. Now you save the cost of the diameter (longest path between any two nodes):

\[\text{Minimum traversal cost (free endpoints)} = 2 \times \text{totalEdgeWeight} - \text{diameter}\]

The diameter can be found with two BFS/DFS passes: BFS from any node to find the farthest node \(u\), then BFS from \(u\) to find the farthest node \(v\). The distance \(u\)-\(v\) is the diameter.


Theme 2: Huffman Coding — The Classic Tree Greedy

The Problem

Given \(N\) symbols with frequencies \(f_1, f_2, \ldots, f_N\), construct a binary prefix-free code that minimizes the weighted path length — the total cost \(\sum f_i \times \text{depth}_i\) where \(\text{depth}_i\) is the code length (depth in the binary tree) of symbol \(i\).

The Greedy Algorithm

  1. Insert all frequencies into a min-heap
  2. While the heap has more than one element:
  3. Extract the two smallest frequencies, \(a\) and \(b\)
  4. Create an internal node with combined frequency \(a + b\)
  5. Insert \(a + b\) back into the heap
  6. The resulting tree is the optimal Huffman tree

The Greedy Choice: Always merge the two smallest frequencies. They belong deepest in the tree because their large depths are multiplied by the smallest weights — minimizing the total cost.

Huffman tree construction: step-by-step merging of the two smallest nodes into a binary tree

Trace: Frequencies \([5, 9, 12, 13, 16, 45]\)

Let's trace the complete construction:

Step Heap before Extract \(a\) Extract \(b\) Merge \(a+b\) Heap after Running cost
1 {5, 9, 12, 13, 16, 45} 5 9 14 {12, 13, 14, 16, 45} 14
2 {12, 13, 14, 16, 45} 12 13 25 {14, 16, 25, 45} \(14 + 25 = 39\)
3 {14, 16, 25, 45} 14 16 30 {25, 30, 45} \(39 + 30 = 69\)
4 {25, 30, 45} 25 30 55 {45, 55} \(69 + 55 = 124\)
5 {45, 55} 45 55 100 {100} \(124 + 100 = 224\)

Total cost \(= 224\). Let's verify by computing \(\sum f_i \times \text{depth}_i\):

Resulting tree:

           (100)
          /     \
        (45)   (55)
               /   \
             (25)  (30)
             / \   / \
           12  13 (14) 16
                  / \
                 5   9
Symbol Frequency Depth \(f \times \text{depth}\)
45 45 1 45
12 12 3 36
13 13 3 39
5 5 4 20
9 9 4 36
16 16 3 48
Total 224

The running cost (sum of all merge costs) equals the weighted path length. This is not a coincidence — each symbol's frequency gets added once for every merge on its path from leaf to root, which is exactly \(f_i \times \text{depth}_i\).

Why Greedy Works: The Exchange Argument

Claim: The two symbols with the smallest frequencies must be siblings at the maximum depth of the optimal tree.

Proof by exchange: Suppose in an optimal tree \(T\), the two smallest-frequency symbols \(x\) and \(y\) are NOT at the deepest level. Let \(a\) and \(b\) be two siblings at the deepest level. Since \(f_x \leq f_a\) and \(f_y \leq f_b\), swapping \(x\) with \(a\) and \(y\) with \(b\):

\[\Delta\text{cost} = (\text{depth}_a - \text{depth}_x)(f_x - f_a) + (\text{depth}_b - \text{depth}_y)(f_y - f_b)\]

Since \(\text{depth}_a \geq \text{depth}_x\) and \(f_x \leq f_a\), the first term is \(\leq 0\). Similarly for the second. Total change \(\leq 0\), so the swap doesn't increase cost.

Connection to CSES Stick Divisions

CSES "Stick Divisions" is Huffman coding in reverse. Dividing a stick of length \(S\) into pieces of lengths \(l_1, l_2, \ldots, l_k\) costs the length of the stick at each cut. The minimum cost division is exactly the Huffman tree built from the target piece lengths.

Full Solution

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int numSymbols;
    cin >> numSymbols;

    auto minHeapCompare = [](long long a, long long b) {
        return a > b;
    };
    priority_queue<long long, vector<long long>, decltype(minHeapCompare)> symbolHeap(minHeapCompare);

    for (int i = 0; i < numSymbols; i++) {
        long long frequency;
        cin >> frequency;
        symbolHeap.push(frequency);
    }

    long long totalEncodingCost = 0;
    while (symbolHeap.size() > 1) {
        long long smallest = symbolHeap.top(); symbolHeap.pop();
        long long secondSmallest = symbolHeap.top(); symbolHeap.pop();
        totalEncodingCost += smallest + secondSmallest;
        symbolHeap.push(smallest + secondSmallest);
    }

    cout << totalEncodingCost << "\n";
    return 0;
}

Complexity: \(O(N \log N)\) — each of the \(N-1\) merges does two heap extractions and one insertion, each \(O(\log N)\).


Theme 3: Tree DP vs. Tree Greedy — When Can You Skip DP?

Many tree problems are solved with DP (compute values bottom-up from leaves to root). But sometimes a greedy approach works and is simpler. Here are the signals to look for.

Signal 1: Monotonic Structure Along Root-to-Leaf Paths

If the optimal choice at each node depends only on the path from root to that node (not on the full subtree below), greedy works. Examples:

  • Assigning values that increase along root-to-leaf paths: Process in BFS order, greedily assign the smallest valid value.
  • Pruning subtrees below a threshold: A DFS that prunes whenever the accumulated cost exceeds a budget.

Signal 2: Optimal Subtree Solutions Compose Independently

If the optimal solution for each subtree doesn't depend on solutions for sibling subtrees, you can solve each subtree greedily in isolation.

Example: Assign colors to nodes such that no two adjacent nodes share a color, minimizing the number of colors. On a tree, this is always 2-colorable (trees are bipartite). A simple DFS alternating colors is greedy and optimal — no subtree choice affects another.

Signal 3: Leaf-to-Root Aggregation with a Single Pass

If the answer is built by aggregating leaf values upward and the aggregation function is associative and monotonic, a single bottom-up pass suffices.

Signal 4: The Exchange Argument Works on Sibling Subtrees

If swapping the processing order of two sibling subtrees never worsens the result (or always improves it when swapped in a specific direction), greedy ordering by that criterion is optimal.

When Greedy Fails on Trees

Greedy fails when subtree decisions interact. Classic example: maximum independent set on a tree. Greedily picking all leaves seems good, but it can block better configurations higher up. This requires DP: dp[v][0] \(=\) best excluding \(v\), dp[v][1] \(=\) best including \(v\).

The dividing line: if the problem has optimal substructure but NOT the greedy choice property (locally optimal choices don't compose into globally optimal), you need DP.


Worked Example: Even Subtree Cuts

Problem: Find the minimum number of edges to remove so that every remaining connected component has an even number of nodes.

Greedy insight: In a DFS, count subtree sizes. Whenever a subtree has even size, "cut" the edge to its parent. Each cut is locally optimal and doesn't affect other subtrees.

int totalCuts = 0;
vector<int> subtreeSize(numNodes, 1);

function<void(int, int)> computeSubtrees = [&](int currentNode, int parentNode) {
    for (int neighbor : adjacencyList[currentNode]) {
        if (neighbor == parentNode) continue;
        computeSubtrees(neighbor, currentNode);
        if (subtreeSize[neighbor] % 2 == 0) {
            totalCuts++;
        } else {
            subtreeSize[currentNode] += subtreeSize[neighbor];
        }
    }
};

computeSubtrees(0, -1);

Trace: 8-Node Tree

        1
       / \
      2   3
     /|    \
    4  5    6
   /
  7
  |
  8
Node (post-order) Children sizes subtreeSize[node] Even? Action
8 1 Odd, propagate up
7 sz[8]=1 (odd) \(1+1 = 2\) Even, cut edge (4,7)! cuts=1
5 1 Odd, propagate up
4 sz[7] cut 1 Odd, propagate up
2 sz[4]=1, sz[5]=1 (both odd) \(1+1+1 = 3\) Odd, propagate up
6 1 Odd, propagate up
3 sz[6]=1 (odd) \(1+1 = 2\) Even, cut edge (1,3)! cuts=2
1 sz[2]=3, sz[3] cut \(1+3 = 4\) Even (root, no edge above)

Result: 2 cuts. Edges removed: (4,7) and (1,3).

Components: \(\{7, 8\}\), \(\{3, 6\}\), \(\{1, 2, 4, 5\}\). Sizes: 2, 2, 4. All even. Correct.


Putting It Together: A Decision Flowchart

When you see a tree problem, ask:

  1. Is it a traversal optimization? Euler tour minus longest path. Greedy.
  2. Is it a merging/splitting problem? Huffman-style PQ greedy. Check exchange argument.
  3. Do subtrees interact? If no, solve each subtree greedily. If yes, consider tree DP.
  4. Is there a monotonic ordering property? Sort children by some criterion, process in order. Exchange argument to verify.
  5. Does the problem reduce to 2-coloring or matching? Trees are bipartite; exploit this structure.

Try It

Exercise 1 — Huffman by hand: Given frequencies \([3, 5, 7, 11, 14, 20]\), build the Huffman tree. What is the total weighted path length? Draw the tree and verify by computing \(\sum f_i \times \text{depth}_i\).

Exercise 2 — Traversal cost: Consider a tree rooted at node 1 with edges: 1-2 (weight 3), 1-3 (weight 5), 2-4 (weight 2), 3-5 (weight 7). You start at node 1, visit all nodes, but don't need to return. What is the minimum travel cost? Which path do you "leave open"?

Hint for Exercise 2: Total edge weight \(= 3+5+2+7 = 17\). Full Euler tour \(= 34\). Longest root-to-leaf path \(= 1 \to 3 \to 5 = 12\). Answer \(= 34 - 12 = 22\). Verify by constructing the actual traversal: \(1 \to 2 \to 4 \to 2 \to 1 \to 3 \to 5\) (cost \(= 3+2+2+3+5+7 = 22\)).

Predict before running: For a star graph (one center node connected to 10 leaves, all edges weight 1), what's the minimum traversal cost without return? What about with return?

Challenge: Modify the even-subtree-cuts problem: instead of even components, you need all components to have size divisible by 3. Does the same greedy work? Why or why not?


Problems

Problem Link Difficulty
CSES — Stick Divisions cses.fi/problemset/task/1161 Medium
CSES — Even Subtree Cuts cses.fi Medium
Huffman Coding Classic / UVa 10954 Medium
LC 630 — Course Schedule III leetcode.com/problems/course-schedule-iii Hard
LC 310 — Minimum Height Trees leetcode.com/problems/minimum-height-trees Medium