Distance and Path Queries
Platform Connection — Metric Spaces: On an array, distance between positions \(a\) and \(b\) is \(|b - a|\). Simple. A tree is a different metric space — you can't subtract positions because there's no linear ordering. Distance must route through the Lowest Common Ancestor. The formula \(d(u,v) = \text{depth}(u) + \text{depth}(v) - 2 \cdot \text{depth}(\text{LCA}(u,v))\) defines a true metric on the tree: it satisfies positivity, symmetry, and the triangle inequality. This is mathematically precise, not an analogy.
![Distance formula: dist(u,v) = depth[u] + depth[v] - 2*depth[LCA(u,v)]](../../../assets/images/tree-course/img_12_04_distance_formula.png)
The Payoff of LCA
You've spent two lessons building LCA machinery. Now you get to use it. This lesson covers the two most common applications: computing the distance between two nodes and computing the sum along a path. Both reduce to a single LCA call plus some arithmetic.
Distance Between Two Nodes
Problem: Given a tree with \(n\) nodes (edges may have weights), answer \(q\) queries: "what is the distance between nodes \(u\) and \(v\)?"
The path from \(u\) to \(v\) in a tree is unique — it goes up from \(u\) to \(LCA(u, v)\), then down to \(v\). The distance is:
Precompute \(distFromRoot[v]\) for every node during the initial DFS. Then each query is one LCA call plus one formula.
Why the formula works: The path from \(u\) to root passes through \(LCA(u,v)\), and so does the path from \(v\) to root. The segment from root to \(LCA\) is counted twice (once from \(u\)'s side, once from \(v\)'s side). Subtracting \(2 \cdot dist(LCA, root)\) removes the double-counted segment, leaving exactly the path from \(u\) to \(v\).
For unweighted trees, \(distFromRoot[v] = depth[v]\), and the formula simplifies to:
Trace: Distance Queries
Unweighted tree. Depths:
| Node | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| Depth | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 3 |
Query: dist(8, 6).
\(LCA(8, 6) = 2\) (from our earlier traces). \(depth[8] + depth[6] - 2 \cdot depth[2] = 3 + 2 - 2 \cdot 1 = 3\).
Manual check: path is 8 → 5 → 2 → 6. Three edges. Correct.
Query: dist(8, 7).
\(LCA(8, 7) = 1\). \(3 + 2 - 2 \cdot 0 = 5\).
Path: 8 → 5 → 2 → 1 → 4 → 7. Five edges. Correct.
Query: dist(5, 6).
\(LCA(5, 6) = 2\). \(2 + 2 - 2 \cdot 1 = 2\).
Path: 5 → 2 → 6. Two edges. Correct.
Query: dist(3, 4).
\(LCA(3, 4) = 1\). \(1 + 1 - 2 \cdot 0 = 2\).
Path: 3 → 1 → 4. Two edges. Correct.
Weighted Example
Now add weights:
\(distFromRoot\):
| Node | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| dist | 0 | 3 | 1 | 4 | 5 | 8 | 10 | 12 |
Query: dist(8, 6).
\(LCA(8, 6) = 2\). \(distFromRoot[8] + distFromRoot[6] - 2 \cdot distFromRoot[2] = 12 + 8 - 2 \cdot 3 = 14\).
Path: 8 →(7)→ 5 →(2)→ 2 →(5)→ 6. Weights: 7 + 2 + 5 = 14. Correct.
Query: dist(6, 7).
\(LCA(6, 7) = 1\). \(8 + 10 - 2 \cdot 0 = 18\).
Path: 6 →(5)→ 2 →(3)→ 1 →(4)→ 4 →(6)→ 7. Weights: 5 + 3 + 4 + 6 = 18. Correct.
Path Sum Queries
Problem: Each node has a value \(w[v]\). Answer queries: "what is the sum of node values on the path from \(u\) to \(v\)?"
Define \(prefixSum[v]\) = sum of node values from the root to \(v\) (inclusive). Compute it during DFS:
Then:
Why the \(+ w[LCA]\) at the end? Because \(prefixSum[u]\) includes \(w[LCA]\), and so does \(prefixSum[v]\). Subtracting \(2 \cdot prefixSum[LCA]\) removes \(w[LCA]\) twice, but we want it once (the LCA is on the path). So we add it back.
Trace: Path Sum
Using the unweighted tree, node values: \(w = [-, 3, 5, 2, 7, 1, 4, 6, 8]\) (1-indexed).
\(prefixSum\):
| Node | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| \(w\) | 3 | 5 | 2 | 7 | 1 | 4 | 6 | 8 |
| \(prefixSum\) | 3 | 8 | 5 | 10 | 9 | 12 | 16 | 17 |
Query: pathSum(8, 6).
\(LCA(8, 6) = 2\). \(prefixSum[8] + prefixSum[6] - 2 \cdot prefixSum[2] + w[2] = 17 + 12 - 2 \cdot 8 + 5 = 18\).
Path nodes: {8, 5, 2, 6}. Sum: \(8 + 1 + 5 + 4 = 18\). Correct.
Query: pathSum(8, 7).
\(LCA(8, 7) = 1\). \(17 + 16 - 2 \cdot 3 + 3 = 30\).
Path nodes: {8, 5, 2, 1, 4, 7}. Sum: \(8 + 1 + 5 + 3 + 7 + 6 = 30\). Correct.
Query: pathSum(3, 3).
\(LCA(3, 3) = 3\). \(5 + 5 - 2 \cdot 5 + 2 = 2\). That's just \(w[3] = 2\). Correct.
What About Path Max?
You might want "maximum node value on the path from \(u\) to \(v\)." Prefix sums don't help here — max isn't invertible like sum.
Two approaches:
-
Heavy-Light Decomposition (HLD): Decompose the tree into chains, use a segment tree on each chain. Supports path max, path min, path updates. Covered in Chapter 14.
-
Binary lifting extension: Store \(maxOnPath[v][k]\) = maximum value on the path from \(v\) to its \(2^k\)-th ancestor. Build it alongside the \(up\) table. Query by decomposing the path into \(O(\log n)\) jumps and taking the max across all jumps. This works for max/min but not for sum (you'd double-count overlapping regions).
For now, know that path sum and path distance are solved with LCA + prefix sums. Path max/min requires heavier machinery.
The Code
const int MAXLOG = 18;
vector<pair<int, long long>> adjacency[200005];
int up[200005][MAXLOG];
int depthOf[200005];
long long distFromRoot[200005];
long long nodeValue[200005];
long long prefixSum[200005];
void dfs(int node, int parent, int currentDepth, long long currentDist) {
depthOf[node] = currentDepth;
distFromRoot[node] = currentDist;
prefixSum[node] = prefixSum[parent] + nodeValue[node];
up[node][0] = parent;
for (int k = 1; k < MAXLOG; k++) {
up[node][k] = up[up[node][k - 1]][k - 1];
}
for (auto [child, weight] : adjacency[node]) {
if (child != parent) {
dfs(child, node, currentDepth + 1, currentDist + weight);
}
}
}
int lca(int u, int v) {
if (depthOf[u] < depthOf[v]) swap(u, v);
int difference = depthOf[u] - depthOf[v];
for (int k = 0; k < MAXLOG; k++) {
if ((difference >> k) & 1) {
u = up[u][k];
}
}
if (u == v) return u;
for (int k = MAXLOG - 1; k >= 0; k--) {
if (up[u][k] != up[v][k]) {
u = up[u][k];
v = up[v][k];
}
}
return up[u][0];
}
long long distance(int u, int v) {
return distFromRoot[u] + distFromRoot[v] - 2 * distFromRoot[lca(u, v)];
}
long long pathSum(int u, int v) {
int ancestor = lca(u, v);
return prefixSum[u] + prefixSum[v] - 2 * prefixSum[ancestor] + nodeValue[ancestor];
}
Complexity: \(O(n \log n)\) preprocessing, \(O(\log n)\) per query (dominated by the LCA call).
The Building Blocks for HLD
Everything in this lesson — distance, path sum, and the observation that path max needs something more — sets the stage for Heavy-Light Decomposition in Chapter 14. HLD breaks every root-to-leaf path into \(O(\log n)\) "heavy chains," each backed by a segment tree. It handles path queries with updates (change a node value, then query path max) in \(O(\log^2 n)\).
You don't need HLD for distance or path sum — the techniques from this lesson are simpler and faster. But when you hit a problem that requires path aggregates with updates, or path aggregates that aren't prefix-sum-friendly (like max or min), HLD is where you'll go.
Try It
Predict before running: On the unweighted chain 1 → 2 → 3 → 4 → 5, what is \(dist(1, 5)\)? \(depth[1] + depth[5] - 2 \cdot depth[LCA(1,5)] = 0 + 4 - 2 \cdot 0 = 4\). The LCA of 1 and 5 is 1 (since 1 is the root and ancestor of everything).
Challenge: If all edge weights are 1, does the distance formula reduce to the depth formula? Yes. \(distFromRoot[v] = depth[v]\) when all edges have weight 1. The formulas are identical.
Challenge 2: Can you handle edge updates (change the weight of an edge) with prefix sums? No. Changing one edge weight changes \(distFromRoot\) for an entire subtree. You'd need Euler tour flattening + BIT for range updates, or HLD.
Edge Cases to Watch For
- pathSum where LCA is an endpoint: When \(u\) is an ancestor of \(v\), the formula \(prefixSum[u] + prefixSum[v] - 2 \cdot prefixSum[LCA] + w[LCA]\) must include \(w[LCA]\) (the LCA's own weight). Forgetting the \(+w[LCA]\) term undercounts by exactly one node's contribution.
- Weighted edges stored on child nodes: If edge weight \((u, v)\) is stored on the child \(v\), the path sum formula changes — the LCA's stored weight is the edge to its parent, not part of the \(u\)-to-\(v\) path. Using it as-is double-counts or includes the wrong edge.
Problems
| Problem | Link | Difficulty |
|---|---|---|
| CSES — Distance Queries | cses.fi/problemset/task/1135 | Medium |
| CSES — Path Queries | cses.fi/problemset/task/1138 | Medium |
| CSES — Path Queries II | cses.fi/problemset/task/1138 | Hard |
| CF 191C — Fools and Roads | codeforces.com/problemset/problem/191/C | Hard |
| LC 1740 — Find Distance in a Binary Tree | leetcode.com/problems/find-distance-in-a-binary-tree | Medium |