Set-Membership Graphs
You're given a collection of sets. Merge two sets if they share an element. Find the minimum merges to connect two target values. No graph anywhere in the statement. Just sets and elements.
The trick: every set-membership relationship is secretly a bipartite graph.
Problem: ABC302-F Merge Set

You have \(N\) sets \(S_1, S_2, \ldots, S_N\) of integers. In one operation, pick two sets that share at least one common element and merge them into one set. Find the minimum number of operations to produce a set containing both 1 and \(M\).
Constraints: \(1 \le N \le 2 \times 10^5\), \(1 \le M \le 2 \times 10^5\), total elements across all sets \(\le 5 \times 10^5\).
The Bipartite Construction
Create two types of nodes:
- Set-nodes: One per set, indexed \(0\) to \(N-1\).
- Element-nodes: One per distinct element value, indexed \(N\) to \(N + M - 1\).
Add an edge between set-node \(i\) and element-node \(x\) whenever \(x \in S_i\).
This is a bipartite graph. Sets only connect to elements. Elements only connect to sets. Every path alternates: element -> set -> element -> set -> ...
💡 Key insight. Two elements are in the same set if and only if they're distance 2 apart in this bipartite graph (both connect to the same set-node). Merging two sets that share an element corresponds to a path of length 2 through that shared element.
BFS Gives the Answer
BFS from element-node 1 to element-node M. The BFS distance \(d\) counts hops in the bipartite graph. Each merge operation corresponds to 2 hops (element -> set -> element). But the first hop is free — element 1 is already in some set. So:
If element-node M is unreachable, output \(-1\).
#include <bits/stdc++.h>
using namespace std;
int main() {
int N, M;
cin >> N >> M;
// set-nodes: 0..N-1, element-nodes: N..N+M-1
int total = N + M;
vector<vector<int>> adj(total);
for (int i = 0; i < N; i++) {
int a;
cin >> a;
for (int j = 0; j < a; j++) {
int x;
cin >> x;
int elem_node = N + x - 1;
adj[i].push_back(elem_node);
adj[elem_node].push_back(i);
}
}
// BFS from element-node for 1
int src = N + 1 - 1; // = N
int dst = N + M - 1;
vector<int> dist(total, -1);
queue<int> q;
dist[src] = 0;
q.push(src);
while (!q.empty()) {
int u = q.front(); q.pop();
for (int v : adj[u]) {
if (dist[v] == -1) {
dist[v] = dist[u] + 1;
q.push(v);
}
}
}
if (dist[dst] == -1)
cout << -1 << endl;
else
cout << dist[dst] / 2 - 1 << endl;
return 0;
}
Complexity: \(O(N + M + \sum |S_i|)\). One BFS over the bipartite graph. No quadratic blowup.
Trace: 4 Sets
\(N = 4\), \(M = 5\). Sets: \(S_1 = \{1, 2\}\), \(S_2 = \{2, 3\}\), \(S_3 = \{4\}\), \(S_4 = \{3, 4, 5\}\).
Nodes: set-nodes {A, B, C, D}, element-nodes {1, 2, 3, 4, 5}.
Bipartite edges:
| Set | Elements |
|---|---|
| A | 1, 2 |
| B | 2, 3 |
| C | 4 |
| D | 3, 4, 5 |
BFS from element-node 1:
| Step | Pop | dist | Push neighbors | Queue after |
|---|---|---|---|---|
| init | — | — | — | [1] (dist[1]=0) |
| 1 | 1 | 0 | A (dist=1) | [A] |
| 2 | A | 1 | 2 (dist=2) | [2] |
| 3 | 2 | 2 | B (dist=3) | [B] |
| 4 | B | 3 | 3 (dist=4) | [3] |
| 5 | 3 | 4 | D (dist=5) | [D] |
| 6 | D | 5 | 4 (dist=6), 5 (dist=6) | [4, 5] |
| 7 | 4 | 6 | C (dist=7) | [5, C] |
| 8 | 5 | 6 | — | [C] |
Element-node 5 has dist = 6. Answer = \(6 / 2 - 1 = 2\).
Verification: Merge \(S_1 \cup S_2\) (share element 2) -> \(\{1,2,3\}\). Then merge with \(S_4\) (share element 3) -> \(\{1,2,3,4,5\}\). Two merges.
Why the Formula Works
The BFS path from element 1 to element 5 is:
Six hops. Every pair of hops (element -> set -> element) represents "this element is in this set, which also contains that element." The first set-node visit doesn't cost a merge — element 1 is already in set A. Each subsequent set-node visit is one merge. With \(k\) set-nodes visited, that's \(k - 1\) merges. Since \(d = 2k\), we get \(k = d/2\) and answer \(= k - 1 = d/2 - 1\).
⚠ Gotcha. The answer is \(d/2 - 1\), not \(d/2\). The first set is free — you're already starting inside it. Off-by-one errors here are extremely common.
The Virtual Node Pattern: LC 127 Word Ladder
Word Ladder: given two words and a dictionary, find the shortest transformation sequence where each step changes exactly one character. Classic BFS problem.
Naive approach: For each pair of words, check if they differ by one character. \(O(N^2 \times L)\) where \(N\) = dictionary size and \(L\) = word length. For \(N = 5000\) and \(L = 10\), that's 250 million — too slow.
Virtual node approach: For each word, generate wildcard patterns by replacing each character with *. Word "hot" generates *ot, h*t, ho*. These patterns are virtual nodes.
int ladderLength(string begin, string end, vector<string>& words) {
unordered_map<string, vector<string>> pattern_to_words;
words.push_back(begin);
for (auto& w : words) {
for (int i = 0; i < (int)w.size(); i++) {
string pat = w;
pat[i] = '*';
pattern_to_words[pat].push_back(w);
}
}
unordered_map<string, int> dist;
queue<string> q;
dist[begin] = 1;
q.push(begin);
while (!q.empty()) {
string word = q.front(); q.pop();
if (word == end) return dist[word];
for (int i = 0; i < (int)word.size(); i++) {
string pat = word;
pat[i] = '*';
for (auto& neighbor : pattern_to_words[pat]) {
if (!dist.count(neighbor)) {
dist[neighbor] = dist[word] + 1;
q.push(neighbor);
}
}
pattern_to_words[pat].clear(); // avoid revisiting
}
}
return 0;
}
The virtual pattern nodes aren't even stored explicitly — the hash map serves the same purpose. Each word connects to \(L\) patterns. Each pattern connects to at most \(N\) words. Total edges: \(O(N \times L)\).
💡 Key insight. Virtual nodes compress dense connections. Instead of checking every pair of words for a one-character difference (\(O(N^2)\)), route through pattern nodes (\(O(N \times L)\)). This is the same bipartite trick as Merge Set.
When to Use Virtual Bipartite Nodes
The trigger phrase is: "objects that share a property are connected."
- Sets sharing an element -> bipartite on (sets, elements).
- Words differing by one char -> bipartite on (words, patterns).
- Numbers sharing a prime factor -> bipartite on (numbers, primes).
- People in the same group -> bipartite on (people, groups).
Without virtual nodes, "all pairs sharing property P" gives \(O(N^2)\) edges in the worst case. With a virtual node for each property value, it's \(O(N \times |\text{properties per object}|)\).
Common Mistakes
Node ID collisions. Set-nodes and element-nodes must have distinct IDs. If sets are numbered 0 to \(N-1\) and elements are 1 to \(M\), offset element nodes by \(N\): element \(x\) maps to node \(N + x - 1\).
Forgetting it's bipartite. The BFS distance alternates between the two node types. The answer requires dividing by 2. If you forget the bipartite structure, you'll report raw BFS distance and get wrong answers.
Not clearing pattern groups. In Word Ladder, once you've processed all words matching a pattern, clear that group to avoid revisiting. Otherwise you get TLE on large inputs.
Problems
| Problem | Link | Difficulty |
|---|---|---|
| ABC302-F Merge Set | atcoder.jp/contests/abc302/tasks/abc302_f | Medium |
| LC 127 — Word Ladder | leetcode.com/problems/word-ladder | Hard |
| LC 1258 — Synonymous Sentences | leetcode.com/problems/synonymous-sentences | Medium |