Find Largest Connected Graph
Level: L3-L5 Topics: Graphs, Geometric Graphs, Union-Find, BFS/DFS, Connected Components
Problem Statement
You are given a set of points in 2D space and a distance threshold d. Build a geometric graph by connecting every pair of points whose Euclidean distance is at most d. Find the largest connected component in this graph (i.e., the maximum number of points that are mutually reachable through edges).
Background & Constraints
- Each point is represented as (x, y) coordinates.
- Two points are connected by an edge if and only if their Euclidean distance is <= d.
- The graph is undirected.
- You need to find the connected component with the most vertices and return its size.
- The number of points can range from a handful to millions in the follow-up discussion.
Examples
Example 1:
Points: [(0,0), (1,0), (3,0), (3,1), (10,10)]
d = 1.5
Edges: (0,0)-(1,0) [distance 1], (3,0)-(3,1) [distance 1]
Connected components:
- {(0,0), (1,0)} — size 2
- {(3,0), (3,1)} — size 2
- {(10,10)} — size 1
Largest connected component size: 2
Example 2:
Points: [(0,0), (1,0), (2,0), (3,0)]
d = 1.5
Edges: (0,0)-(1,0), (1,0)-(2,0), (2,0)-(3,0)
All points are in one connected component.
Largest connected component size: 4
Hints & Common Pitfalls
- Brute-force approach: Compute all pairwise distances O(n^2), build adjacency list, then run BFS/DFS or Union-Find to identify connected components. This works but is slow for large inputs.
- Union-Find is a natural fit: iterate over all pairs, union them if distance <= d. Use path compression and union by rank for near-constant time per operation.
- Spatial indexing (e.g., k-d tree, grid hashing) can speed up neighbor lookups dramatically. Instead of checking all pairs, only check points in nearby grid cells.
- Grid hashing optimization: Divide the plane into cells of size d. A point can only be connected to points in its own cell or the 8 adjacent cells. This reduces the number of distance computations significantly.
- Common mistake: Using a sorted list or single-dimensional projection and missing connections in the other dimension.
Follow-Up Questions
- Your brute-force solution is O(n^2). Can you do better using spatial data structures?
- How would you parallelize the computation across multiple machines? Consider how to partition the 2D space while handling points near partition boundaries.
- What if points are added dynamically one at a time? Can you maintain the largest connected component size efficiently without rebuilding from scratch? (Think about incremental Union-Find.)
- What if the distance threshold d changes over time? Can you precompute something useful?
- How would you extend this to 3D space or higher dimensions?