Skip to content

BFS on Graphs

BFS shortest path tree with distances from source

BFS Is a Shortest Path Algorithm

BFS wavefront expanding level by level from source

BFS traversal with queue states at each step

On trees, BFS is a nice way to get level-order output. On graphs, BFS solves a fundamentally harder problem: shortest paths in unweighted graphs.

This isn't a side effect. It's the defining property. BFS visits all nodes at distance 1 before any node at distance 2, all at distance 2 before any at distance 3, and so on. The first time BFS reaches a node, it found the shortest path to that node. Always.

If you only remember one thing from this lesson: BFS on an unweighted graph gives shortest paths. DFS does not.


Why BFS Gives Shortest Paths

The argument is simple. BFS uses a FIFO queue. When you process a node at distance \(d\), you enqueue its unvisited neighbors at distance \(d+1\). Those go to the back of the queue. All other distance-\(d\) nodes are still at the front. So you finish all distance-\(d\) nodes before touching any distance-\((d+1)\) node.

By induction: the first time you reach any node, you reached it with the minimum number of edges.

💡 Key insight. DFS can find a path, but it might find a long winding path through the entire graph. BFS always finds the shortest one. This is why "shortest path in unweighted graph" is always BFS.


The BFS Template

int dist[200005];
bool visited[200005];
int parent[200005];
vector<int> adj[200005];

void bfs(int start) {
    memset(dist, -1, sizeof(dist));
    memset(parent, -1, sizeof(parent));
    queue<int> q;

    q.push(start);
    dist[start] = 0;
    visited[start] = true;

    while (!q.empty()) {
        int u = q.front();
        q.pop();
        for (int v : adj[u]) {
            if (!visited[v]) {
                visited[v] = true;
                dist[v] = dist[u] + 1;
                parent[v] = u;
                q.push(v);
            }
        }
    }
}

Three arrays:

  • visited[] -- prevents revisiting nodes (same role as in DFS).
  • dist[] -- stores the shortest distance from start to each node.
  • parent[] -- stores how we reached each node (for path reconstruction).

Mark Visited on Push, Not on Pop

This is the single most common BFS bug. It matters.

Wrong -- marking on pop:

while (!q.empty()) {
    int u = q.front();
    q.pop();
    visited[u] = true;    // TOO LATE
    for (int v : adj[u]) {
        if (!visited[v]) {
            q.push(v);
        }
    }
}

Why it breaks: Suppose nodes A and B are both in the queue, and both have neighbor C. A is popped first, pushes C. B is popped next, C isn't marked visited yet (we mark on pop, and C hasn't been popped), so B pushes C again. Now C is in the queue twice.

In the best case, you waste time processing C twice. In the worst case -- on dense graphs -- the queue explodes to \(O(m)\) size instead of \(O(n)\).

Correct -- marking on push:

while (!q.empty()) {
    int u = q.front();
    q.pop();
    for (int v : adj[u]) {
        if (!visited[v]) {
            visited[v] = true;   // mark IMMEDIATELY
            q.push(v);
        }
    }
}

The moment you decide to push a node, mark it visited. This guarantees each node enters the queue exactly once.

⚠ Gotcha. Mark visited when pushing, not when popping. This is the #1 BFS bug. It causes TLE on large graphs because nodes get pushed into the queue multiple times.


BFS Trace

Consider this undirected graph:

1 --- 2 --- 5
|     |
3 --- 4 --- 6

Adjacency list:

  • 1: [2, 3]
  • 2: [1, 4, 5]
  • 3: [1, 4]
  • 4: [2, 3, 6]
  • 5: [2]
  • 6: [4]

BFS from node 1:

Step Pop Queue After Pop Process Neighbors Push Queue After Push dist[] update
0 -- -- Init 1 [1] dist[1]=0
1 1 [] 2 (unvisited), 3 (unvisited) 2, 3 [2, 3] dist[2]=1, dist[3]=1
2 2 [3] 1 (visited), 4 (unvisited), 5 (unvisited) 4, 5 [3, 4, 5] dist[4]=2, dist[5]=2
3 3 [4, 5] 1 (visited), 4 (visited) -- [4, 5] --
4 4 [5] 2 (visited), 3 (visited), 6 (unvisited) 6 [5, 6] dist[6]=3
5 5 [6] 2 (visited) -- [6] --
6 6 [] 4 (visited) -- [] --

Final distances from node 1:

Node 1 2 3 4 5 6
dist 0 1 1 2 2 3

Notice the layered pattern: all distance-1 nodes (2, 3) are processed before any distance-2 node (4, 5). All distance-2 nodes are processed before the distance-3 node (6). This is the shortest-path guarantee in action.


Path Reconstruction

BFS gives you shortest distances, but many problems ask for the actual path. The parent[] array handles this.

During BFS, when you discover node v from node u, set parent[v] = u. After BFS finishes, backtrack from the target to the source using parent[]:

vector<int> reconstructPath(int start, int target) {
    if (dist[target] == -1) return {};  // unreachable

    vector<int> path;
    for (int v = target; v != -1; v = parent[v]) {
        path.push_back(v);
    }
    reverse(path.begin(), path.end());
    return path;
}

Path Reconstruction Trace

From the BFS above, find the path from 1 to 6.

Parent array after BFS:

Node 1 2 3 4 5 6
parent -1 1 1 2 2 4

Backtrack from 6:

Step Current parent[current] Path So Far
1 6 4 [6]
2 4 2 [6, 4]
3 2 1 [6, 4, 2]
4 1 -1 (stop) [6, 4, 2, 1]

After reversing: 1 -> 2 -> 4 -> 6 (length 3, matching dist[6] = 3).


BFS vs DFS: When to Use Which

This decision comes up on nearly every graph problem. Here's the rule:

Situation Use Why
Shortest path (unweighted) BFS BFS explores by distance; first arrival = shortest
Cycle detection DFS Back edges in DFS tree = cycles
Connected components Either Both explore full components; DFS is simpler to code
Topological sort DFS Post-order gives reverse topo order (Chapter 3)
Path existence (any path) Either Both find a path if one exists
Level/layer structure BFS BFS naturally gives level boundaries
Reachability from one source Either Same time complexity
Shortest path (weighted) Neither Use Dijkstra or Bellman-Ford (Chapter 5)

💡 Key insight. The default choice for "can I reach X from Y" is DFS (simpler code, less memory). The moment the problem says "shortest," "minimum edges," or "fewest steps," switch to BFS.


Complexity

For a graph with \(n\) nodes and \(m\) edges:

  • Time: \(O(n + m)\). Each node is pushed and popped exactly once (\(O(n)\)). Each edge is examined exactly twice in an undirected graph (\(O(m)\)).
  • Space: \(O(n)\) for the queue, visited array, dist array, and parent array.

This is the same complexity as DFS. The difference is not speed -- it's what each algorithm computes. DFS computes traversal order and discovers structure (cycles, components). BFS computes distances.


Try It

The starter code has a BFS skeleton with TODOs for three things: marking visited and setting distance/parent inside the loop, and path reconstruction at the end. Fill them in.

Test case 1 -- path exists:

Input:
5 5
1 2
2 3
3 5
1 4
4 5

Output:
1 4 5

Note: the shortest path from 1 to 5 has length 2 (either 1-4-5 or 1-2-3... wait, 1-2-3-5 is length 3). So BFS finds 1-4-5.

Test case 2 -- no path:

Input:
4 2
1 2
3 4

Output:
IMPOSSIBLE

Predict before running: In test case 1, which node does BFS discover first -- node 3 or node 4? Both are distance 1 from node 1... wrong. Node 2 is distance 1, node 4 is distance 1. Node 3 is distance 2. So nodes 2 and 4 are discovered first (both at distance 1), then nodes 3 and 5 (both at distance 2).

Challenge: What if you want ALL shortest paths from 1 to n, not just one? How would you modify the parent array? (Hint: instead of one parent per node, store a list of parents -- all nodes u where dist[u] + 1 == dist[v].)

Challenge 2: Can BFS detect cycles? Yes, but it's unusual. If during BFS you encounter a visited neighbor that isn't the parent, there's a cycle. Why is DFS preferred for cycle detection despite this? (Hint: BFS finds the cycle exists but reconstructing the actual cycle path is harder than with DFS.)


Problems

Problem Link Difficulty
CSES Message Route cses.fi/problemset/task/1667 Easy
LC 1091 — Shortest Path in Binary Matrix leetcode.com/problems/shortest-path-in-binary-matrix Medium
LC 127 — Word Ladder leetcode.com/problems/word-ladder Hard