Skip to content

DSU Core

Two Lines of Code That Answer Connectivity in Constant Time

Graph connectivity seems like it should require traversal. BFS or DFS, right? Wrong. Disjoint Set Union answers "are nodes A and B connected?" with two array lookups and a pointer chase that, in practice, never exceeds 4 steps. No adjacency list needed. No visited array. Just two flat arrays.


The Data Structure

Union-Find forest before and after a union operation

DSU maintains two arrays:

  • par[i] -- the parent of node i. Initially par[i] = i (every node is its own root).
  • rnk[i] -- an upper bound on the height of node i's subtree. Initially 0.

Every set is a rooted tree. The root is the set's representative. Two elements are in the same set if and only if they have the same root.


Find: Chase the Root

find(x) follows the parent chain from x upward until it hits a node that is its own parent. That's the root.

// Naive find — no optimization
int find(int x) {
    while (par[x] != x)
        x = par[x];
    return x;
}

This is O(tree height). In the worst case, the tree is a long chain and find is O(n). We need to fix that.


Path Compression: Flatten on the Way Up

Path compression: all nodes point directly to root after find

After finding the root, set every node on the path to point directly to the root. Next time any of these nodes calls find, it's one hop.

int find(int x) {
    if (par[x] != x)
        par[x] = find(par[x]);
    return par[x];
}

That's it. One line of recursion. The tree collapses into a star shape over time.

💡 Key insight. Path compression doesn't change which set a node belongs to. It only changes the internal tree structure to make future queries faster.


Union by Rank: Keep Trees Short

Union by rank: attach shorter tree under taller tree

When merging two sets, attach the shorter tree under the taller one. This prevents the tree from growing unnecessarily tall.

void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

Why rank and not size? Either works. Rank is simpler: it only increments when two trees of the same rank merge. Size-based union is equally valid and sometimes more convenient when you need component sizes.

⚠ Gotcha. Always call find() on both arguments before comparing or linking. Operating on non-root nodes corrupts the structure.


Complexity: Inverse Ackermann

With both path compression and union by rank, any sequence of m operations on n elements takes O(m * alpha(n)) time, where alpha is the inverse Ackermann function.

alpha(n) <= 4 for any n up to 2^(2^(2^(2^16))). That's a number with about 20,000 digits. For all practical purposes, alpha(n) = constant.

n alpha(n)
1 0
2-3 1
4-7 2
8-2047 3
2048 to 2^(2^16) 4

You will never see alpha(n) = 5 in any real problem.


Full Trace

Start with 6 elements {1, 2, 3, 4, 5, 6}. Process these unions in order: unite(1,2), unite(3,4), unite(5,6), unite(1,3), unite(1,5).

Operation Action par[] (after) rnk[] (after) Sets
init -- [1,2,3,4,5,6] [0,0,0,0,0,0] {1}{2}{3}{4}{5}{6}
unite(1,2) 2's root under 1, rank tie -> rnk[1]++ [1,1,3,4,5,6] [1,0,0,0,0,0] {1,2}{3}{4}{5}{6}
unite(3,4) 4's root under 3, rank tie -> rnk[3]++ [1,1,3,3,5,6] [1,0,1,0,0,0] {1,2}{3,4}{5}{6}
unite(5,6) 6's root under 5, rank tie -> rnk[5]++ [1,1,3,3,5,5] [1,0,1,0,1,0] {1,2}{3,4}{5,6}
unite(1,3) rnk[1]==rnk[3], attach 3 under 1, rnk[1]++ [1,1,1,3,5,5] [2,0,1,0,1,0] {1,2,3,4}{5,6}
unite(1,5) rnk[1]>rnk[5], attach 5 under 1 [1,1,1,3,1,5] [2,0,1,0,1,0] {1,2,3,4,5,6}

After unite(1,5), call find(6): 6->5->1. Path compression sets par[5]=1 and par[6]=1. Next find(6) is one hop.


CSES Building Roads

Problem: A country has N cities and M roads. Find the minimum number of new roads needed to connect all cities.

Translation: Count connected components after processing all edges. Answer = components - 1.

#include <bits/stdc++.h>
using namespace std;

int par[200005], rnk[200005];

int find(int x) {
    if (par[x] != x)
        par[x] = find(par[x]);
    return par[x];
}

void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

int main() {
    int n, m;
    cin >> n >> m;
    for (int i = 1; i <= n; i++) {
        par[i] = i;
        rnk[i] = 0;
    }

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        unite(u, v);
    }

    // Collect component roots
    vector<int> roots;
    for (int i = 1; i <= n; i++)
        if (find(i) == i) roots.push_back(i);

    cout << (int)roots.size() - 1 << "\n";
    for (int i = 1; i < (int)roots.size(); i++)
        cout << roots[i - 1] << " " << roots[i] << "\n";
}

💡 Key insight. The problem also asks you to print which roads to build. Just chain the component roots together: connect root 1 to root 2, root 2 to root 3, etc.


Try It

The starter code has the DSU skeleton. Fill in find and unite, then count components.

Test case 1:

Input:
5 3
1 2
3 4
1 3

Output:
1
1 5

Test case 2:

Input:
4 0

Output:
3
1 2
2 3
3 4

Challenge: Rewrite the DSU using union by size instead of rank. Maintain a sz[] array and always attach the smaller component under the larger one. Verify that the results are identical.


Problems

Problem Link Difficulty
CSES Building Roads cses.fi/problemset/task/1666 Easy
LC 547 — Number of Provinces leetcode.com/problems/number-of-provinces Medium
LC 684 — Redundant Connection leetcode.com/problems/redundant-connection Medium