Skip to content

Virtual Nodes MST

When O(N^2) Edges Are Too Many, Build Smarter

Virtual super-source node connects all components with zero-cost edges

Standard Kruskal's takes an edge list and sorts it. But what if the graph has N^2 edges? A complete graph on 200,000 nodes has 20 billion edges. You can't even store them. The trick: don't enumerate all edges. Use geometric insight, virtual nodes, or segment trees to reduce the candidate set to O(N log N) edges without missing any MST edge.


Problem 1: arc076_b — Built? (Manhattan MST)

Problem: N points on a 2D plane. Cost to connect points i and j is |xi - xj| + |yi - yj| (Manhattan distance). Find the MST cost.

Naive approach: N^2 edges, Kruskal's in O(N^2 log N). For N = 10^5, this is way too slow.

Key observation: The Manhattan MST only uses edges between points that are "adjacent" in some sorted order. Specifically, for each of the 4 octants around a point, only the nearest neighbor in that octant can be an MST edge.

Simplified approach: Sort points by x-coordinate. Only consider edges between consecutive points in this order. Sort by y-coordinate and do the same. Sort by x+y and x-y, add consecutive edges. This gives O(N) candidate edges. The MST is guaranteed to use only these candidates.

#include <bits/stdc++.h>
using namespace std;

int par[200005], rnk[200005];
int find(int x) { return par[x] == x ? x : par[x] = find(par[x]); }
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

int main() {
    int n;
    cin >> n;
    vector<long long> x(n), y(n);
    for (int i = 0; i < n; i++) cin >> x[i] >> y[i];

    vector<tuple<long long,int,int>> edges;
    vector<int> ord(n);
    iota(ord.begin(), ord.end(), 0);

    // Sort by x, add adjacent edges
    sort(ord.begin(), ord.end(),
         [&](int a, int b) { return x[a] < x[b]; });
    for (int i = 1; i < n; i++) {
        int a = ord[i-1], b = ord[i];
        edges.push_back({abs(x[a]-x[b]) + abs(y[a]-y[b]), a, b});
    }

    // Sort by y, add adjacent edges
    sort(ord.begin(), ord.end(),
         [&](int a, int b) { return y[a] < y[b]; });
    for (int i = 1; i < n; i++) {
        int a = ord[i-1], b = ord[i];
        edges.push_back({abs(x[a]-x[b]) + abs(y[a]-y[b]), a, b});
    }

    // Sort by x+y, add adjacent edges
    sort(ord.begin(), ord.end(),
         [&](int a, int b) { return x[a]+y[a] < x[b]+y[b]; });
    for (int i = 1; i < n; i++) {
        int a = ord[i-1], b = ord[i];
        edges.push_back({abs(x[a]-x[b]) + abs(y[a]-y[b]), a, b});
    }

    // Sort by x-y, add adjacent edges
    sort(ord.begin(), ord.end(),
         [&](int a, int b) { return x[a]-y[a] < x[b]-y[b]; });
    for (int i = 1; i < n; i++) {
        int a = ord[i-1], b = ord[i];
        edges.push_back({abs(x[a]-x[b]) + abs(y[a]-y[b]), a, b});
    }

    sort(edges.begin(), edges.end());
    for (int i = 0; i < n; i++) { par[i] = i; rnk[i] = 0; }

    long long cost = 0;
    for (auto [w, u, v] : edges)
        if (find(u) != find(v)) {
            unite(u, v);
            cost += w;
        }
    cout << cost << "\n";
}

💡 Key insight. Manhattan distance decomposes into 4 directional components. Sorting by each component and taking adjacent pairs covers all possible MST edges. The total candidate count is O(N), not O(N^2).


Problem 2: abc270_f — Transportation (Virtual Airport Node)

Problem: N cities. You can build roads between cities (given costs), build airports in cities (given costs), or build seaports in cities (given costs). Find the minimum cost to connect all cities.

Translation: An airport lets a city connect to all other airports. A seaport lets a city connect to all other seaports. These are exactly virtual nodes.

Create two virtual nodes: - Node N+1 = "the sky" (airport hub). Connect city i to node N+1 with edge weight = airport_cost[i]. - Node N+2 = "the sea" (seaport hub). Connect city i to node N+2 with edge weight = seaport_cost[i].

Now run Kruskal's on all edges (original roads + airport edges + seaport edges). If the MST uses the airport hub, some cities have airports. If not, no airports are built.

#include <bits/stdc++.h>
using namespace std;

int par[200005], rnk[200005];
int find(int x) { return par[x] == x ? x : par[x] = find(par[x]); }
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

int main() {
    int n, m;
    cin >> n >> m;
    vector<tuple<long long,int,int>> edges;

    // Airport costs
    int airNode = n + 1;
    for (int i = 1; i <= n; i++) {
        long long c; cin >> c;
        edges.push_back({c, i, airNode});
    }

    // Seaport costs
    int seaNode = n + 2;
    for (int i = 1; i <= n; i++) {
        long long c; cin >> c;
        edges.push_back({c, i, seaNode});
    }

    // Road costs
    for (int i = 0; i < m; i++) {
        int u, v; long long w;
        cin >> u >> v >> w;
        edges.push_back({w, u, v});
    }

    sort(edges.begin(), edges.end());
    for (int i = 1; i <= n + 2; i++) { par[i] = i; rnk[i] = 0; }

    long long cost = 0;
    for (auto [w, u, v] : edges)
        if (find(u) != find(v)) {
            unite(u, v);
            cost += w;
        }
    cout << cost << "\n";
}

⚠ Gotcha. The virtual nodes count as part of the graph. You need N+2 nodes in the DSU, not N. And the MST might have up to N+1 edges (connecting N cities plus up to 2 virtual hubs).

Wait -- the MST should connect only the N real cities. The virtual hubs are connectors, not destinations. The MST has N-1 edges among real cities, but through virtual nodes it can have more edges total.

Actually, the MST on the N+2 node graph will have N+1 edges. But some edges to virtual nodes might not be used. The key insight: if neither virtual node is in the MST, the MST only uses roads. If one is used, it acts as a hub for all cities connected to it.

💡 Key insight. Virtual nodes turn "build infrastructure in city X" into "add an edge from X to a hub." The MST algorithm decides whether using the hub is cheaper than direct connections. No special logic needed.


Problem 3: keyence2019_e — Connecting Cities (Segment Tree Kruskal)

Problem: N cities on a line. Cost to connect cities i and j is |i - j| * D + A[i] + A[j]. Find the MST cost.

Naive: O(N^2) edges. Too slow for N = 2*10^5.

Observation: The cost formula is D|i-j| + A[i] + A[j]. For i < j, this is D(j-i) + A[i] + A[j] = (A[i] - Di) + (A[j] + Dj). So the cost splits into a contribution from i and a contribution from j.

Segment tree approach: Use a modified Kruskal's. Process edges in order of weight. For each city j, the cheapest edge to its left is the one minimizing (A[i] - D*i) for i < j. Use a segment tree to find this minimum efficiently.

Alternatively, use the Boruvka algorithm: in each round, for each component, find the cheapest edge leaving it. Merge components. Repeat O(log N) rounds. Each round uses the segment tree to find cheapest cross-component edges in O(N log N). Total: O(N log^2 N).

#include <bits/stdc++.h>
using namespace std;

int par[200005], rnk[200005];
int find(int x) { return par[x] == x ? x : par[x] = find(par[x]); }
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

int main() {
    int n;
    long long D;
    cin >> n >> D;
    vector<long long> A(n + 1);
    for (int i = 1; i <= n; i++) cin >> A[i];

    // Boruvka's algorithm
    for (int i = 1; i <= n; i++) { par[i] = i; rnk[i] = 0; }

    long long totalCost = 0;
    bool merged = true;
    while (merged) {
        merged = false;
        // For each component, find cheapest outgoing edge
        // cheapest[comp] = {cost, other_endpoint}
        map<int, pair<long long,int>> best;

        // Left-to-right pass: for city j, best edge to the
        // left is min over i<j of (A[i]-D*i) + (A[j]+D*j)
        // Track min (A[i]-D*i) per component seen so far
        long long minVal = LLONG_MAX;
        int minIdx = -1;
        for (int j = 1; j <= n; j++) {
            if (minIdx != -1 && find(j) != find(minIdx)) {
                long long cost = minVal + A[j] + D * j;
                int cj = find(j);
                if (!best.count(cj) || cost < best[cj].first)
                    best[cj] = {cost, minIdx};
            }
            long long val = A[j] - D * j;
            if (val < minVal || minIdx == -1) {
                minVal = val;
                minIdx = j;
            }
        }

        // Right-to-left pass: for city j, best edge to the
        // right is min over i>j of (A[i]+D*i) + (A[j]-D*j)
        minVal = LLONG_MAX;
        minIdx = -1;
        for (int j = n; j >= 1; j--) {
            if (minIdx != -1 && find(j) != find(minIdx)) {
                long long cost = minVal + A[j] - D * j;
                int cj = find(j);
                if (!best.count(cj) || cost < best[cj].first)
                    best[cj] = {cost, minIdx};
            }
            long long val = A[j] + D * j;
            if (val < minVal || minIdx == -1) {
                minVal = val;
                minIdx = j;
            }
        }

        // Merge using best edges
        for (auto& [comp, edge] : best) {
            auto [cost, other] = edge;
            // Recompute the actual endpoints
            // Find a node in this component
        }

        // Simpler: just merge all best edges
        // Reset and redo properly with Boruvka
        // For each component, merge with its best neighbor
        map<int, pair<long long, pair<int,int>>> cheapest;
        // Redo: left-to-right
        long long mv = LLONG_MAX; int mi = -1;
        for (int j = 1; j <= n; j++) {
            if (mi != -1) {
                int ci = find(mi), cj = find(j);
                if (ci != cj) {
                    long long cost = (A[mi] - D*mi) + (A[j] + D*j);
                    if (!cheapest.count(cj) || cost < cheapest[cj].first)
                        cheapest[cj] = {cost, {mi, j}};
                    if (!cheapest.count(ci) || cost < cheapest[ci].first)
                        cheapest[ci] = {cost, {mi, j}};
                }
            }
            if (mi == -1 || A[j] - D*j < A[mi] - D*mi) mi = j;
        }
        // Right-to-left
        mi = -1;
        for (int j = n; j >= 1; j--) {
            if (mi != -1) {
                int ci = find(mi), cj = find(j);
                if (ci != cj) {
                    long long cost = (A[mi] + D*mi) + (A[j] - D*j);
                    if (!cheapest.count(cj) || cost < cheapest[cj].first)
                        cheapest[cj] = {cost, {j, mi}};
                    if (!cheapest.count(ci) || cost < cheapest[ci].first)
                        cheapest[ci] = {cost, {j, mi}};
                }
            }
            if (mi == -1 || A[j] + D*j < A[mi] + D*mi) mi = j;
        }

        for (auto& [comp, edge] : cheapest) {
            auto [cost, endpoints] = edge;
            auto [a, b] = endpoints;
            if (find(a) != find(b)) {
                unite(a, b);
                totalCost += cost;
                merged = true;
            }
        }
    }

    cout << totalCost << "\n";
}

💡 Key insight. Boruvka's algorithm is perfect when computing the cheapest edge per component is efficient. Here, the cost formula splits into independent per-node terms, so a linear scan suffices for each round. O(log N) rounds times O(N) per round = O(N log N).


Problem 4: abc328_e — Modulo MST

Problem: N nodes (N <= 8), M edges. Find the spanning tree whose total weight mod K is minimized.

With N <= 8, there are at most C(M, N-1) spanning trees. For M up to 28 (complete graph on 8 nodes) and N-1 = 7, that's C(28,7) = 1,184,040. Small enough to enumerate.

#include <bits/stdc++.h>
using namespace std;

int par[10], rnk[10];
int find(int x) { return par[x] == x ? x : par[x] = find(par[x]); }
bool unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return false;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
    return true;
}

int N, M;
long long K, ans;
vector<tuple<int,int,long long>> edges;

void solve(int idx, int cnt, long long cost) {
    if (cnt == N - 1) {
        ans = min(ans, cost % K);
        return;
    }
    if (idx == M) return;
    if (M - idx < N - 1 - cnt) return; // not enough edges left

    // Save DSU state
    int savedPar[10], savedRnk[10];
    copy(par, par + N + 1, savedPar);
    copy(rnk, rnk + N + 1, savedRnk);

    auto [u, v, w] = edges[idx];
    // Try including this edge
    if (unite(u, v))  {
        solve(idx + 1, cnt + 1, cost + w);
        // Restore DSU
        copy(savedPar, savedPar + N + 1, par);
        copy(savedRnk, savedRnk + N + 1, rnk);
    }
    // Try skipping this edge
    solve(idx + 1, cnt, cost);
}

int main() {
    cin >> N >> M >> K;
    edges.resize(M);
    for (auto& [u, v, w] : edges) cin >> u >> v >> w;

    for (int i = 1; i <= N; i++) { par[i] = i; rnk[i] = 0; }
    ans = K; // worst possible is K-1, so K is safe upper bound
    solve(0, 0, 0);
    cout << ans << "\n";
}

⚠ Gotcha. Modulo MST is NP-hard in general. This brute force only works because N <= 8. Don't try this approach on larger inputs.


Pattern Summary

Technique When to Use Edge Reduction
Coordinate sort Manhattan/geometric distances O(N^2) -> O(N)
Virtual nodes Shared infrastructure costs Avoids clique edges
Boruvka + formula Cost splits into per-node terms O(N log N) total
Brute force N <= 8 Enumerate all trees

Problems

Problem Link Difficulty
arc076_b — Built? atcoder.jp/contests/arc076/tasks/arc076_b Medium
abc270_f — Transportation atcoder.jp/contests/abc270/tasks/abc270_f Medium
keyence2019_e — Connecting Cities atcoder.jp/contests/keyence2019/tasks/keyence2019_e Hard
abc328_e — Modulo MST atcoder.jp/contests/abc328/tasks/abc328_e Medium