Remove Edges from Tree

Level: L3-L5 Topics: Trees, Subtree Sizes, Connected Components, DFS

Problem Statement

You are given a tree with N vertices (numbered 1 to N) and M queries. Each query provides a pair (E_i, K_i), where E_i is an edge in the tree and K_i is a threshold value. For each query, determine the number of vertices in each connected component that would result from removing edge E_i, and report how many components have size greater than or equal to K_i.

More specifically: removing a single edge from a tree splits it into exactly two connected components. You need to compute their sizes efficiently across many queries.

Background & Constraints

The tree has N vertices and N-1 edges.
There are M independent queries — each query removes a single edge (the tree is restored between queries).
1 <= N <= 10^5
1 <= M <= 10^5
An O(N * M) brute-force approach (re-running BFS/DFS per query) is suboptimal. The interviewer expects a better solution.

Examples

Example 1:

Tree (5 vertices):
    1
   / \
  2   3
 / \
4   5

Edges: (1,2), (1,3), (2,4), (2,5)

Query: Remove edge (1,2), K=2
Result: Two components — {1,3} with size 2 and {2,4,5} with size 3.
Both components have size >= 2, so the answer is 2.

Query: Remove edge (2,4), K=3
Result: Two components — {4} with size 1 and {1,2,3,5} with size 4.
Only one component has size >= 3, so the answer is 1.

Hints & Common Pitfalls

Key insight: Root the tree at any vertex and pre-calculate the subtree size for every node using a single DFS. When you remove the edge between a node and its parent, the two resulting components have sizes subtree_size[child] and N - subtree_size[child].
Pre-computation makes queries O(1): After one O(N) DFS to compute all subtree sizes, each query can be answered in constant time by looking up which node is the child endpoint of the removed edge.
Common mistake: Re-running a full traversal for each query, leading to O(N * M) time.
Edge representation: You need to map each edge to the child node in the rooted tree. If the edge is given as (u, v), determine which of u or v is the child (the one farther from the root).
Be careful with 1-indexed vs 0-indexed vertices.

Follow-Up Questions

Can you answer each query in O(1) time after O(N) preprocessing?
What if instead of removing one edge per query, you remove multiple edges simultaneously? How does the problem change?
How would you handle the case where the graph is not a tree (i.e., it has cycles)?
What if queries are cumulative — edges stay removed across queries? How would you track component sizes efficiently?
Can you extend this to weighted trees where the "size" of a component is the sum of vertex weights?