Skip to content

Distance and Path Queries

Platform Connection — Metric Spaces: On an array, distance between positions \(a\) and \(b\) is \(|b - a|\). Simple. A tree is a different metric space — you can't subtract positions because there's no linear ordering. Distance must route through the Lowest Common Ancestor. The formula \(d(u,v) = \text{depth}(u) + \text{depth}(v) - 2 \cdot \text{depth}(\text{LCA}(u,v))\) defines a true metric on the tree: it satisfies positivity, symmetry, and the triangle inequality. This is mathematically precise, not an analogy.

Distance formula: dist(u,v) = depth[u] + depth[v] - 2*depth[LCA(u,v)]

The Payoff of LCA

You've spent two lessons building LCA machinery. Now you get to use it. This lesson covers the two most common applications: computing the distance between two nodes and computing the sum along a path. Both reduce to a single LCA call plus some arithmetic.


Distance Between Two Nodes

Problem: Given a tree with \(n\) nodes (edges may have weights), answer \(q\) queries: "what is the distance between nodes \(u\) and \(v\)?"

The path from \(u\) to \(v\) in a tree is unique — it goes up from \(u\) to \(LCA(u, v)\), then down to \(v\). The distance is:

\[dist(u, v) = dist(u, root) + dist(v, root) - 2 \cdot dist(LCA(u, v), root)\]

Precompute \(distFromRoot[v]\) for every node during the initial DFS. Then each query is one LCA call plus one formula.

Why the formula works: The path from \(u\) to root passes through \(LCA(u,v)\), and so does the path from \(v\) to root. The segment from root to \(LCA\) is counted twice (once from \(u\)'s side, once from \(v\)'s side). Subtracting \(2 \cdot dist(LCA, root)\) removes the double-counted segment, leaving exactly the path from \(u\) to \(v\).

For unweighted trees, \(distFromRoot[v] = depth[v]\), and the formula simplifies to:

\[dist(u, v) = depth[u] + depth[v] - 2 \cdot depth[LCA(u, v)]\]

Trace: Distance Queries

          1
        / | \
       2  3  4
      / \    |
     5   6   7
    /
   8

Unweighted tree. Depths:

Node 1 2 3 4 5 6 7 8
Depth 0 1 1 1 2 2 2 3

Query: dist(8, 6).

\(LCA(8, 6) = 2\) (from our earlier traces). \(depth[8] + depth[6] - 2 \cdot depth[2] = 3 + 2 - 2 \cdot 1 = 3\).

Manual check: path is 8 → 5 → 2 → 6. Three edges. Correct.

Query: dist(8, 7).

\(LCA(8, 7) = 1\). \(3 + 2 - 2 \cdot 0 = 5\).

Path: 8 → 5 → 2 → 1 → 4 → 7. Five edges. Correct.

Query: dist(5, 6).

\(LCA(5, 6) = 2\). \(2 + 2 - 2 \cdot 1 = 2\).

Path: 5 → 2 → 6. Two edges. Correct.

Query: dist(3, 4).

\(LCA(3, 4) = 1\). \(1 + 1 - 2 \cdot 0 = 2\).

Path: 3 → 1 → 4. Two edges. Correct.


Weighted Example

Now add weights:

          1
        / | \
     (3) (1) (4)
      2    3   4
    / \       |
  (2) (5)   (6)
   5    6    7
  /
 (7)
  8

\(distFromRoot\):

Node 1 2 3 4 5 6 7 8
dist 0 3 1 4 5 8 10 12

Query: dist(8, 6).

\(LCA(8, 6) = 2\). \(distFromRoot[8] + distFromRoot[6] - 2 \cdot distFromRoot[2] = 12 + 8 - 2 \cdot 3 = 14\).

Path: 8 →(7)→ 5 →(2)→ 2 →(5)→ 6. Weights: 7 + 2 + 5 = 14. Correct.

Query: dist(6, 7).

\(LCA(6, 7) = 1\). \(8 + 10 - 2 \cdot 0 = 18\).

Path: 6 →(5)→ 2 →(3)→ 1 →(4)→ 4 →(6)→ 7. Weights: 5 + 3 + 4 + 6 = 18. Correct.


Path Sum Queries

Problem: Each node has a value \(w[v]\). Answer queries: "what is the sum of node values on the path from \(u\) to \(v\)?"

Define \(prefixSum[v]\) = sum of node values from the root to \(v\) (inclusive). Compute it during DFS:

\[prefixSum[v] = prefixSum[parent[v]] + w[v]\]

Then:

\[pathSum(u, v) = prefixSum[u] + prefixSum[v] - 2 \cdot prefixSum[LCA(u,v)] + w[LCA(u,v)]\]

Why the \(+ w[LCA]\) at the end? Because \(prefixSum[u]\) includes \(w[LCA]\), and so does \(prefixSum[v]\). Subtracting \(2 \cdot prefixSum[LCA]\) removes \(w[LCA]\) twice, but we want it once (the LCA is on the path). So we add it back.


Trace: Path Sum

Using the unweighted tree, node values: \(w = [-, 3, 5, 2, 7, 1, 4, 6, 8]\) (1-indexed).

\(prefixSum\):

Node 1 2 3 4 5 6 7 8
\(w\) 3 5 2 7 1 4 6 8
\(prefixSum\) 3 8 5 10 9 12 16 17

Query: pathSum(8, 6).

\(LCA(8, 6) = 2\). \(prefixSum[8] + prefixSum[6] - 2 \cdot prefixSum[2] + w[2] = 17 + 12 - 2 \cdot 8 + 5 = 18\).

Path nodes: {8, 5, 2, 6}. Sum: \(8 + 1 + 5 + 4 = 18\). Correct.

Query: pathSum(8, 7).

\(LCA(8, 7) = 1\). \(17 + 16 - 2 \cdot 3 + 3 = 30\).

Path nodes: {8, 5, 2, 1, 4, 7}. Sum: \(8 + 1 + 5 + 3 + 7 + 6 = 30\). Correct.

Query: pathSum(3, 3).

\(LCA(3, 3) = 3\). \(5 + 5 - 2 \cdot 5 + 2 = 2\). That's just \(w[3] = 2\). Correct.


What About Path Max?

You might want "maximum node value on the path from \(u\) to \(v\)." Prefix sums don't help here — max isn't invertible like sum.

Two approaches:

  1. Heavy-Light Decomposition (HLD): Decompose the tree into chains, use a segment tree on each chain. Supports path max, path min, path updates. Covered in Chapter 14.

  2. Binary lifting extension: Store \(maxOnPath[v][k]\) = maximum value on the path from \(v\) to its \(2^k\)-th ancestor. Build it alongside the \(up\) table. Query by decomposing the path into \(O(\log n)\) jumps and taking the max across all jumps. This works for max/min but not for sum (you'd double-count overlapping regions).

For now, know that path sum and path distance are solved with LCA + prefix sums. Path max/min requires heavier machinery.


The Code

const int MAXLOG = 18;
vector<pair<int, long long>> adjacency[200005];
int up[200005][MAXLOG];
int depthOf[200005];
long long distFromRoot[200005];
long long nodeValue[200005];
long long prefixSum[200005];

void dfs(int node, int parent, int currentDepth, long long currentDist) {
    depthOf[node] = currentDepth;
    distFromRoot[node] = currentDist;
    prefixSum[node] = prefixSum[parent] + nodeValue[node];
    up[node][0] = parent;
    for (int k = 1; k < MAXLOG; k++) {
        up[node][k] = up[up[node][k - 1]][k - 1];
    }
    for (auto [child, weight] : adjacency[node]) {
        if (child != parent) {
            dfs(child, node, currentDepth + 1, currentDist + weight);
        }
    }
}

int lca(int u, int v) {
    if (depthOf[u] < depthOf[v]) swap(u, v);
    int difference = depthOf[u] - depthOf[v];
    for (int k = 0; k < MAXLOG; k++) {
        if ((difference >> k) & 1) {
            u = up[u][k];
        }
    }
    if (u == v) return u;
    for (int k = MAXLOG - 1; k >= 0; k--) {
        if (up[u][k] != up[v][k]) {
            u = up[u][k];
            v = up[v][k];
        }
    }
    return up[u][0];
}

long long distance(int u, int v) {
    return distFromRoot[u] + distFromRoot[v] - 2 * distFromRoot[lca(u, v)];
}

long long pathSum(int u, int v) {
    int ancestor = lca(u, v);
    return prefixSum[u] + prefixSum[v] - 2 * prefixSum[ancestor] + nodeValue[ancestor];
}

Complexity: \(O(n \log n)\) preprocessing, \(O(\log n)\) per query (dominated by the LCA call).


The Building Blocks for HLD

Everything in this lesson — distance, path sum, and the observation that path max needs something more — sets the stage for Heavy-Light Decomposition in Chapter 14. HLD breaks every root-to-leaf path into \(O(\log n)\) "heavy chains," each backed by a segment tree. It handles path queries with updates (change a node value, then query path max) in \(O(\log^2 n)\).

You don't need HLD for distance or path sum — the techniques from this lesson are simpler and faster. But when you hit a problem that requires path aggregates with updates, or path aggregates that aren't prefix-sum-friendly (like max or min), HLD is where you'll go.


Try It

Predict before running: On the unweighted chain 1 → 2 → 3 → 4 → 5, what is \(dist(1, 5)\)? \(depth[1] + depth[5] - 2 \cdot depth[LCA(1,5)] = 0 + 4 - 2 \cdot 0 = 4\). The LCA of 1 and 5 is 1 (since 1 is the root and ancestor of everything).

Challenge: If all edge weights are 1, does the distance formula reduce to the depth formula? Yes. \(distFromRoot[v] = depth[v]\) when all edges have weight 1. The formulas are identical.

Challenge 2: Can you handle edge updates (change the weight of an edge) with prefix sums? No. Changing one edge weight changes \(distFromRoot\) for an entire subtree. You'd need Euler tour flattening + BIT for range updates, or HLD.

Edge Cases to Watch For

  • pathSum where LCA is an endpoint: When \(u\) is an ancestor of \(v\), the formula \(prefixSum[u] + prefixSum[v] - 2 \cdot prefixSum[LCA] + w[LCA]\) must include \(w[LCA]\) (the LCA's own weight). Forgetting the \(+w[LCA]\) term undercounts by exactly one node's contribution.
  • Weighted edges stored on child nodes: If edge weight \((u, v)\) is stored on the child \(v\), the path sum formula changes — the LCA's stored weight is the edge to its parent, not part of the \(u\)-to-\(v\) path. Using it as-is double-counts or includes the wrong edge.

Problems

Problem Link Difficulty
CSES — Distance Queries cses.fi/problemset/task/1135 Medium
CSES — Path Queries cses.fi/problemset/task/1138 Medium
CSES — Path Queries II cses.fi/problemset/task/1138 Hard
CF 191C — Fools and Roads codeforces.com/problemset/problem/191/C Hard
LC 1740 — Find Distance in a Binary Tree leetcode.com/problems/find-distance-in-a-binary-tree Medium