BFS on Graphs

BFS shortest path tree with distances from source

BFS Is a Shortest Path Algorithm

BFS wavefront expanding level by level from source

BFS traversal with queue states at each step

On trees, BFS is a nice way to get level-order output. On graphs, BFS solves a fundamentally harder problem: shortest paths in unweighted graphs.

This isn't a side effect. It's the defining property. BFS visits all nodes at distance 1 before any node at distance 2, all at distance 2 before any at distance 3, and so on. The first time BFS reaches a node, it found the shortest path to that node. Always.

If you only remember one thing from this lesson: BFS on an unweighted graph gives shortest paths. DFS does not.

Why BFS Gives Shortest Paths

The argument is simple. BFS uses a FIFO queue. When you process a node at distance \(d\), you enqueue its unvisited neighbors at distance \(d+1\). Those go to the back of the queue. All other distance-\(d\) nodes are still at the front. So you finish all distance-\(d\) nodes before touching any distance-\((d+1)\) node.

By induction: the first time you reach any node, you reached it with the minimum number of edges.

💡 Key insight. DFS can find a path, but it might find a long winding path through the entire graph. BFS always finds the shortest one. This is why "shortest path in unweighted graph" is always BFS.

The BFS Template

int dist[200005];
bool visited[200005];
int parent[200005];
vector<int> adj[200005];

void bfs(int start) {
    memset(dist, -1, sizeof(dist));
    memset(parent, -1, sizeof(parent));
    queue<int> q;

    q.push(start);
    dist[start] = 0;
    visited[start] = true;

    while (!q.empty()) {
        int u = q.front();
        q.pop();
        for (int v : adj[u]) {
            if (!visited[v]) {
                visited[v] = true;
                dist[v] = dist[u] + 1;
                parent[v] = u;
                q.push(v);
            }
        }
    }
}

Three arrays:

visited[] -- prevents revisiting nodes (same role as in DFS).
dist[] -- stores the shortest distance from start to each node.
parent[] -- stores how we reached each node (for path reconstruction).

Mark Visited on Push, Not on Pop

This is the single most common BFS bug. It matters.

Wrong -- marking on pop:

while (!q.empty()) {
    int u = q.front();
    q.pop();
    visited[u] = true;    // TOO LATE
    for (int v : adj[u]) {
        if (!visited[v]) {
            q.push(v);
        }
    }
}

Why it breaks: Suppose nodes A and B are both in the queue, and both have neighbor C. A is popped first, pushes C. B is popped next, C isn't marked visited yet (we mark on pop, and C hasn't been popped), so B pushes C again. Now C is in the queue twice.

In the best case, you waste time processing C twice. In the worst case -- on dense graphs -- the queue explodes to \(O(m)\) size instead of \(O(n)\).

Correct -- marking on push:

while (!q.empty()) {
    int u = q.front();
    q.pop();
    for (int v : adj[u]) {
        if (!visited[v]) {
            visited[v] = true;   // mark IMMEDIATELY
            q.push(v);
        }
    }
}

The moment you decide to push a node, mark it visited. This guarantees each node enters the queue exactly once.

⚠ Gotcha. Mark visited when pushing, not when popping. This is the #1 BFS bug. It causes TLE on large graphs because nodes get pushed into the queue multiple times.

BFS Trace

Consider this undirected graph:

1 --- 2 --- 5
|     |
3 --- 4 --- 6

Adjacency list:

1: [2, 3]
2: [1, 4, 5]
3: [1, 4]
4: [2, 3, 6]
5: [2]
6: [4]

BFS from node 1:

Step	Pop	Queue After Pop	Process Neighbors	Push	Queue After Push	dist[] update
0	--	--	Init	1	[1]	dist[1]=0
1	1	[]	2 (unvisited), 3 (unvisited)	2, 3	[2, 3]	dist[2]=1, dist[3]=1
2	2	[3]	1 (visited), 4 (unvisited), 5 (unvisited)	4, 5	[3, 4, 5]	dist[4]=2, dist[5]=2
3	3	[4, 5]	1 (visited), 4 (visited)	--	[4, 5]	--
4	4	[5]	2 (visited), 3 (visited), 6 (unvisited)	6	[5, 6]	dist[6]=3
5	5	[6]	2 (visited)	--	[6]	--
6	6	[]	4 (visited)	--	[]	--

Final distances from node 1:

Node	1	2	3	4	5	6
dist	0	1	1	2	2	3

Notice the layered pattern: all distance-1 nodes (2, 3) are processed before any distance-2 node (4, 5). All distance-2 nodes are processed before the distance-3 node (6). This is the shortest-path guarantee in action.

Path Reconstruction

BFS gives you shortest distances, but many problems ask for the actual path. The parent[] array handles this.

During BFS, when you discover node v from node u, set parent[v] = u. After BFS finishes, backtrack from the target to the source using parent[]:

vector<int> reconstructPath(int start, int target) {
    if (dist[target] == -1) return {};  // unreachable

    vector<int> path;
    for (int v = target; v != -1; v = parent[v]) {
        path.push_back(v);
    }
    reverse(path.begin(), path.end());
    return path;
}

Path Reconstruction Trace

From the BFS above, find the path from 1 to 6.

Parent array after BFS:

Node	1	2	3	4	5	6
parent	-1	1	1	2	2	4

Backtrack from 6:

Step	Current	parent[current]	Path So Far
1	6	4	[6]
2	4	2	[6, 4]
3	2	1	[6, 4, 2]
4	1	-1 (stop)	[6, 4, 2, 1]

After reversing: 1 -> 2 -> 4 -> 6 (length 3, matching dist[6] = 3).

BFS vs DFS: When to Use Which

This decision comes up on nearly every graph problem. Here's the rule:

Situation	Use	Why
Shortest path (unweighted)	BFS	BFS explores by distance; first arrival = shortest
Cycle detection	DFS	Back edges in DFS tree = cycles
Connected components	Either	Both explore full components; DFS is simpler to code
Topological sort	DFS	Post-order gives reverse topo order (Chapter 3)
Path existence (any path)	Either	Both find a path if one exists
Level/layer structure	BFS	BFS naturally gives level boundaries
Reachability from one source	Either	Same time complexity
Shortest path (weighted)	Neither	Use Dijkstra or Bellman-Ford (Chapter 5)

💡 Key insight. The default choice for "can I reach X from Y" is DFS (simpler code, less memory). The moment the problem says "shortest," "minimum edges," or "fewest steps," switch to BFS.

Complexity

For a graph with \(n\) nodes and \(m\) edges:

Time: \(O(n + m)\). Each node is pushed and popped exactly once (\(O(n)\)). Each edge is examined exactly twice in an undirected graph (\(O(m)\)).
Space: \(O(n)\) for the queue, visited array, dist array, and parent array.

This is the same complexity as DFS. The difference is not speed -- it's what each algorithm computes. DFS computes traversal order and discovers structure (cycles, components). BFS computes distances.

Try It

The starter code has a BFS skeleton with TODOs for three things: marking visited and setting distance/parent inside the loop, and path reconstruction at the end. Fill them in.

Test case 1 -- path exists:

Input:
5 5
1 2
2 3
3 5
1 4
4 5

Output:
1 4 5

Note: the shortest path from 1 to 5 has length 2 (either 1-4-5 or 1-2-3... wait, 1-2-3-5 is length 3). So BFS finds 1-4-5.

Test case 2 -- no path:

Input:
4 2
1 2
3 4

Output:
IMPOSSIBLE

Predict before running: In test case 1, which node does BFS discover first -- node 3 or node 4? Both are distance 1 from node 1... wrong. Node 2 is distance 1, node 4 is distance 1. Node 3 is distance 2. So nodes 2 and 4 are discovered first (both at distance 1), then nodes 3 and 5 (both at distance 2).

Challenge: What if you want ALL shortest paths from 1 to n, not just one? How would you modify the parent array? (Hint: instead of one parent per node, store a list of parents -- all nodes u where dist[u] + 1 == dist[v].)

Challenge 2: Can BFS detect cycles? Yes, but it's unusual. If during BFS you encounter a visited neighbor that isn't the parent, there's a cycle. Why is DFS preferred for cycle detection despite this? (Hint: BFS finds the cycle exists but reconstructing the actual cycle path is harder than with DFS.)

Problems

Problem	Link	Difficulty
CSES Message Route	cses.fi/problemset/task/1667	Easy
LC 1091 — Shortest Path in Binary Matrix	leetcode.com/problems/shortest-path-in-binary-matrix	Medium
LC 127 — Word Ladder	leetcode.com/problems/word-ladder	Hard