Skip to content

Difference Arrays on Trees

Difference array marking on a tree path: +1 at endpoints, -1 at LCA and parent of LCA

The Problem That Needs This

You're given a tree with \(n\) nodes and \(m\) paths. Each path goes from node \(u\) to node \(v\). For every node in the tree, count how many paths pass through it.

Constraints: \(n, m \leq 2 \times 10^5\).

This is CSES "Counting Paths" — and it shows up in competitive programming constantly.


The Naive Approach: Walk Every Path

For each path \((u, v)\), find the path from \(u\) to \(v\) (via LCA), walk every node on it, increment a counter.

A single path can have \(O(n)\) nodes. With \(m\) paths, that's \(O(nm)\) total work. For \(n = m = 2 \times 10^5\), that's \(4 \times 10^{10}\) operations. Far too slow.

The question is: can we "mark" an entire path in \(O(\log n)\) instead of \(O(n)\)?


The Key Insight: Difference Arrays on Arrays (Review)

On a flat array, if you want to add +1 to every element in range \([l, r]\), you don't touch each element. You set:

\[\text{diff}[l] \mathrel{+}= 1, \quad \text{diff}[r+1] \mathrel{-}= 1\]

Then a prefix sum over diff gives the final values. Each range update is \(O(1)\).

The tree version of this same idea: instead of a prefix sum on an array, we do a subtree sum via DFS. Instead of marking endpoints of a range, we mark endpoints of a path.


The Technique: Path Marking with LCA

For a path from \(u\) to \(v\) with \(\text{LCA}(u, v) = w\):

  1. \(\text{diff}[u] \mathrel{+}= 1\)
  2. \(\text{diff}[v] \mathrel{+}= 1\)
  3. \(\text{diff}[w] \mathrel{-}= 1\)
  4. \(\text{diff}[\text{parent}(w)] \mathrel{-}= 1\)

After processing all paths, run a single DFS that computes the subtree sum of diff for every node. The subtree sum at node \(x\) equals the number of paths passing through \(x\).

Why does this work? Think about what flows upward during the subtree sum:

  • Node \(u\) contributes +1. This +1 "flows up" through every ancestor of \(u\) — all the way to the root.
  • Node \(v\) contributes +1. Same thing — flows up through every ancestor of \(v\).
  • Node \(w\) (LCA) contributes -1. This cancels the double-counting: without it, every ancestor of \(w\) would be counted twice (once from \(u\)'s side, once from \(v\)'s side).
  • Parent of \(w\) contributes -1. This stops the path from extending above \(w\) — the path goes through \(w\) but not above it.

The net effect: exactly the nodes on the path from \(u\) to \(v\) get a count of +1.


Full Trace

Tree structure:

        1
       / \
      2   3
     / \   \
    4   5   6

Paths to process: - Path 1: \(4 \to 6\) (LCA = 1) - Path 2: \(4 \to 5\) (LCA = 2)

After marking Path 1 (\(u=4, v=6, w=1, \text{parent}(w) = \text{none}\)):

Node diff
1 -1
2 0
3 0
4 +1
5 0
6 +1

(No parent of root, so we skip the 4th step — or treat parent(root) as a virtual node.)

After marking Path 2 (\(u=4, v=5, w=2\)):

Node diff
1 -1 - 1 = -2
2 0 - 1 = -1
3 0
4 +1 + 1 = +2
5 0 + 1 = +1
6 +1

Subtree sums (DFS from leaves up):

Node diff[node] Children's subtree sums Subtree sum Meaning
4 +2 0 2 2 paths through node 4
5 +1 0 1 1 path through node 5
2 -1 2 + 1 = 3 -1 + 3 = 2 2 paths through node 2
6 +1 0 1 1 path through node 6
3 0 1 1 1 path through node 3
1 -2 2 + 1 = 3 -2 + 3 = 1 1 path through node 1

Verification: - Path \(4 \to 6\): goes through 4, 2, 1, 3, 6. Nodes 4, 2, 1, 3, 6 should each get +1 from this path. - Path \(4 \to 5\): goes through 4, 2, 5. Nodes 4, 2, 5 should each get +1 from this path. - Node 4: on both paths → count = 2. Correct. - Node 2: on both paths → count = 2. Correct. - Node 1: on path 1 only → count = 1. Correct. - Node 3: on path 1 only → count = 1. Correct. - Node 5: on path 2 only → count = 1. Correct. - Node 6: on path 1 only → count = 1. Correct.


The Algorithm

  1. Build LCA structure — binary lifting from Lesson 2 (\(O(n \log n)\) preprocessing).
  2. For each path \((u, v)\): compute \(w = \text{LCA}(u, v)\), apply the four diff operations (\(O(\log n)\) for LCA).
  3. Single DFS: compute subtree sums of the diff array (\(O(n)\)).

Total: \(O((n + m) \log n)\).


The Code

// Build LCA structure (binary lifting)
auto buildLCA = [&]() {
    vector<bool> visited(nodeCount + 1, false);
    queue<int> bfsQueue;
    bfsQueue.push(1);
    visited[1] = true;
    while (!bfsQueue.empty()) {
        int node = bfsQueue.front();
        bfsQueue.pop();
        for (int neighbor : adjacency[node]) {
            if (!visited[neighbor]) {
                visited[neighbor] = true;
                depth[neighbor] = depth[node] + 1;
                up[neighbor][0] = node;
                for (int k = 1; k < LOG; k++)
                    up[neighbor][k] = up[up[neighbor][k - 1]][k - 1];
                bfsQueue.push(neighbor);
            }
        }
    }
};

auto lca = [&](int u, int v) -> int {
    if (depth[u] < depth[v]) swap(u, v);
    int diff = depth[u] - depth[v];
    for (int k = 0; k < LOG; k++)
        if ((diff >> k) & 1)
            u = up[u][k];
    if (u == v) return u;
    for (int k = LOG - 1; k >= 0; k--)
        if (up[u][k] != up[v][k]) {
            u = up[u][k];
            v = up[v][k];
        }
    return up[u][0];
};

// For each path (u, v), apply difference array technique
diff[u]++;
diff[v]++;
diff[lca(u, v)]--;
if (up[lca(u, v)][0] != 0)
    diff[up[lca(u, v)][0]]--;

// DFS to compute subtree sums
function<int(int, int)> computeSubtreeSum = [&](int node, int parent) -> int {
    int total = diff[node];
    for (int child : adjacency[node])
        if (child != parent)
            total += computeSubtreeSum(child, node);
    return result[node] = total;
};

Complexity: \(O((n + m) \log n)\) time, \(O(n \log n)\) space.


The Pattern: Difference Arrays Generalize to Trees

This technique works for any "increment all nodes/edges on a path" operation:

  • Node counting (this lesson): diff at endpoints and LCA, subtree sum gives node counts
  • Edge counting: mark the child end of each edge instead of the node — diff[u]++, diff[v]++, diff[lca]-=2
  • Weighted increments: use +w instead of +1 for weighted paths

The idea is always the same: avoid walking the path. Mark the endpoints, let the subtree sum propagate the effect.


Variant: Edge Difference

For "count paths through each edge" instead of each node, a small change:

\[\text{diff}[u] \mathrel{+}= 1, \quad \text{diff}[v] \mathrel{+}= 1, \quad \text{diff}[w] \mathrel{-}= 2\]

No parent decrement needed. The subtree sum at node \(x\) gives the count for the edge from \(x\) to its parent.


Try It

Predict before running: Given a star graph (node 1 connected to nodes 2, 3, 4, 5) with paths: \(2 \to 3\), \(4 \to 5\), \(2 \to 5\). What's the count at node 1?

All three paths pass through node 1 (the center). Count = 3.

Challenge: What if you need to count paths through each edge, not node? Modify the diff formula.

Edge Cases to Watch For

  • Path whose LCA is the root: \(parent(\text{root})\) doesn't exist. The fourth decrement \(diff[parent(LCA)]--\) accesses a nonexistent node. Use a sentinel node 0 or guard with if (LCA != root).
  • Edge counting variant: The formula changes: use \(diff[w] -= 2\) instead of \(diff[w]--; diff[parent(w)]--\). The subtree sum at node \(x\) then gives the count for the edge \((x, parent(x))\). Using the node-counting formula for edge counting is off by one per path.

Problems

Problem Link Difficulty
CSES — Counting Paths cses.fi/problemset/task/1136 Medium
CF — Edge Queries codeforces.com/problemset/problem/191/C Hard

What's Next?

You now have three LCA methods and can handle path/distance/counting queries. Chapter 14 pushes further: Heavy-Light Decomposition chains entire paths for segment-tree-powered path queries in \(O(\log^2 n)\).