Skip to content

Virtual Tree

Virtual tree compresses the full tree to only important nodes and their LCAs

The Problem: Queries on a Small Subset

You have a tree with \(n = 10^5\) nodes. A query gives you \(k\) "important" nodes where \(k\) is small — maybe 100 or 1000. You need to answer something about the paths connecting these important nodes (e.g., minimum edge weight connecting them all, or total distance between all pairs).

Running any algorithm on the full \(n\)-node tree for each query wastes time on nodes that don't matter. The key observation: the only nodes that affect the answer are the \(k\) important nodes and their pairwise LCAs.

The virtual tree (also called auxiliary tree) is a compressed tree containing at most \(2k - 1\) nodes. It preserves the ancestor-descendant relationships and distances between important nodes, discarding everything else.


What Goes Into the Virtual Tree

Given important nodes \(\{v_1, v_2, \ldots, v_k\}\):

  1. Include all \(k\) important nodes.
  2. Include all pairwise LCAs of consecutive important nodes when sorted by DFS entry time.
  3. Include the LCA of the entire set (which is the LCA of the first and last nodes in DFS order).

The total number of nodes is at most \(2k - 1\). Edges in the virtual tree represent paths in the original tree — you can annotate them with the original path's length or minimum weight.


Construction Algorithm

  1. Preprocess the original tree: compute DFS entry times and binary lifting tables for LCA queries.
  2. Sort the important nodes by DFS entry time: \(v_1, v_2, \ldots, v_k\).
  3. Compute pairwise LCAs: for each consecutive pair \((v_i, v_{i+1})\), compute \(\text{LCA}(v_i, v_{i+1})\).
  4. Collect all relevant nodes: the \(k\) important nodes + their pairwise LCAs. Remove duplicates and sort by DFS entry time.
  5. Build the virtual tree using a stack:
  6. Process nodes in DFS order.
  7. Maintain a stack representing the current root-to-node path in the virtual tree.
  8. For each new node: pop the stack until the top is an ancestor of the new node. The new node becomes a child of the stack top.

Trace: Building a Virtual Tree

Original tree (10 nodes):

          1
        / | \
       2  3  4
      / \    |
     5   6   7
    / \
   8   9
       |
      10

DFS entry times (one possible DFS order):

Node 1 2 5 8 9 10 6 3 4 7
tin 0 1 2 3 4 5 6 7 8 9

Important nodes: {6, 8, 10, 7}

Step 1: Sort by DFS entry time: 8 (tin=3), 10 (tin=5), 6 (tin=6), 7 (tin=9).

Step 2: Compute pairwise LCAs: - LCA(8, 10) = 5 (both under node 5) - LCA(10, 6) = 2 (10 is under 5 under 2; 6 is under 2) - LCA(6, 7) = 1 (6 is under 2 under 1; 7 is under 4 under 1)

Step 3: All relevant nodes: {8, 10, 6, 7, 5, 2, 1}. Sort by DFS entry: 1, 2, 5, 8, 10, 6, 7.

Step 4: Build with stack:

Process Stack (before) Action Stack (after) Edge added
1 [] Push 1 [1]
2 [1] 1 is ancestor of 2. Push 2 [1, 2] 1→2
5 [1, 2] 2 is ancestor of 5. Push 5 [1, 2, 5] 2→5
8 [1, 2, 5] 5 is ancestor of 8. Push 8 [1, 2, 5, 8] 5→8
10 [1, 2, 5, 8] 8 is NOT ancestor of 10. Pop 8. 5 is ancestor of 10. Push 10 [1, 2, 5, 10] 5→10
6 [1, 2, 5, 10] 10 is NOT ancestor of 6. Pop 10. 5 is NOT ancestor of 6. Pop 5. 2 is ancestor of 6. Push 6 [1, 2, 6] 2→6
7 [1, 2, 6] 6 is NOT ancestor of 7. Pop 6. 2 is NOT ancestor of 7. Pop 2. 1 is ancestor of 7. Push 7 [1, 7] 1→7

Virtual tree:

    1
   / \
  2   7
 / \
5   6
|\ 
8  10

7 nodes instead of 10. For larger trees with \(n = 10^5\) and \(k = 100\), the compression is dramatic: 199 nodes instead of 100,000.


The Code

void dfs(int node, int parentNode, int dep) {
    tin[node] = timer++;
    depthNode[node] = dep;
    ancestor[node][0] = parentNode;

    for (int k = 1; k < LOG; k++) {
        ancestor[node][k] = ancestor[ancestor[node][k-1]][k-1];
    }

    for (int neighbor : adj[node]) {
        if (neighbor == parentNode) continue;
        dfs(neighbor, node, dep + 1);
    }

    tout[node] = timer++;
}

bool isAncestor(int u, int v) {
    return tin[u] <= tin[v] && tout[v] <= tout[u];
}

int lca(int u, int v) {
    if (isAncestor(u, v)) return u;
    if (isAncestor(v, u)) return v;

    for (int k = LOG - 1; k >= 0; k--) {
        if (!isAncestor(ancestor[u][k], v)) {
            u = ancestor[u][k];
        }
    }
    return ancestor[u][0];
}

struct VirtualTree {
    vector<int> nodes;
    map<int, vector<pair<int,int>>> virtualAdj;

    void build(vector<int>& importantNodes) {
        sort(importantNodes.begin(), importantNodes.end(),
             [](int a, int b) { return tin[a] < tin[b]; });

        vector<int> allNodes = importantNodes;
        for (int i = 0; i + 1 < (int)importantNodes.size(); i++) {
            allNodes.push_back(lca(importantNodes[i], importantNodes[i+1]));
        }

        sort(allNodes.begin(), allNodes.end(),
             [](int a, int b) { return tin[a] < tin[b]; });
        allNodes.erase(unique(allNodes.begin(), allNodes.end()), allNodes.end());

        nodes = allNodes;
        virtualAdj.clear();

        vector<int> stk;
        for (int node : allNodes) {
            while (!stk.empty() && !isAncestor(stk.back(), node)) {
                stk.pop_back();
            }
            if (!stk.empty()) {
                int parentInVT = stk.back();
                int edgeWeight = depthNode[node] - depthNode[parentInVT];
                virtualAdj[parentInVT].push_back({node, edgeWeight});
            }
            stk.push_back(node);
        }
    }
};

Complexity: \(O(k \log k + k \log n)\) per query — sorting the important nodes and computing LCAs. The preprocessing on the original tree is \(O(n \log n)\) for binary lifting.


When to Use Virtual Trees

Virtual trees shine when: - The full tree is large but each query involves a small subset. - You have multiple queries with different subsets. - The algorithm you want to run on the important nodes would be too slow on the full tree.

Common applications: Steiner tree on trees (connect \(k\) nodes with minimum total edge weight), distance sums between \(k\) nodes, and problems where \(\sum k\) across all queries is bounded but \(n\) per query is not.


Try It

Input: 10-node tree as in the trace, important nodes {6, 8, 10, 7}
Output: virtual tree with 7 nodes (1, 2, 5, 6, 7, 8, 10)

Predict before running: If all important nodes lie on a single root-to-leaf path, what does the virtual tree look like? A single chain — no LCAs add new nodes because every pairwise LCA is already in the set.

Challenge: Using the virtual tree, compute the minimum total edge weight to connect all important nodes (Steiner tree on a tree). Hint: the answer is the sum of all edge weights in the virtual tree.

Edge Cases to Watch For

  • An important node is the root: The root must always be included in the virtual tree (it anchors the structure). If your construction algorithm only adds LCAs of consecutive nodes in DFS order, the root may be missing when it isn't the LCA of any adjacent pair — add it explicitly.
  • Pairwise LCAs that are themselves important nodes: When an LCA of two important nodes is also in the important set, the deduplication step must handle this. Double-counting such a node creates duplicate edges in the virtual tree.

Problems

Problem Link Difficulty
CF 613D — Kingdom and its Cities codeforces.com/contest/613/problem/D Hard
CF 1320E — Treeland and Viruses codeforces.com/contest/1320/problem/E Hard