Skip to content

De Bruijn Sequences

De Bruijn graph for binary k=3: Euler path produces the sequence

A lock with 4 buttons accepts any 4-digit binary code. You want to try all 16 possible codes by sliding a window across one long string. The shortest such string has length 19, not 64. That compression is a de Bruijn sequence, and finding it is just an Euler circuit in disguise.


What Is a De Bruijn Sequence?

A de Bruijn sequence \(B(k, n)\) over an alphabet of size \(k\) is the shortest cyclic string that contains every possible string of length \(n\) as a substring. For binary (\(k = 2\)), order \(n\):

  • There are \(2^n\) binary strings of length \(n\).
  • The de Bruijn sequence has length \(2^n\) (cyclic) or \(2^n + n - 1\) (linear).
  • Every \(n\)-length binary string appears exactly once as a contiguous window.

For \(n = 3\): the 8 binary strings are 000, 001, 010, 011, 100, 101, 110, 111. One valid de Bruijn sequence: 00010111. Wrap around to get 000, 001, 010, 101, 011, 111, 110, 100. All 8 present.

Key insight. A brute-force approach concatenating all strings gives length \(n \cdot 2^n\). The de Bruijn sequence achieves \(2^n\) by reusing overlaps. Each new character reveals exactly one new \(n\)-length window.


The De Bruijn Graph

The magic comes from a graph construction. Build a directed graph where:

  • Nodes are all binary strings of length \(n - 1\). There are \(2^{n-1}\) nodes.
  • Edges: from node \(s_1 s_2 \dots s_{n-1}\) to node \(s_2 s_3 \dots s_{n-1} c\), labeled \(c\). Each node has exactly 2 outgoing edges (append 0 or 1) and 2 incoming edges.

For \(n = 3\), the nodes are: 00, 01, 10, 11. The edges:

From Append 0 → To Append 1 → To
00 00 01
01 10 11
10 00 01
11 10 11

Every node has in-degree 2 and out-degree 2. The graph is strongly connected. By the Euler circuit conditions for directed graphs, an Euler circuit exists.

Key insight. Each edge in the de Bruijn graph represents one \(n\)-length string. The edge from "01" to "11" (appending 1) represents "011". An Euler circuit visits every edge once -- therefore it produces every \(n\)-length string once.


Building the Graph with Bit Manipulation

We represent each \((n-1)\)-length binary string as an integer. Node \(u\) represents the binary string of length \(n-1\) whose value is \(u\).

The edge from \(u\) appending bit \(c\) goes to node:

\[v = ((u \ll 1) \mid c) \mathbin{\&} ((1 \ll (n-1)) - 1)\]

Shift \(u\) left by 1, OR in the new bit, then mask to keep only the bottom \(n-1\) bits.

int numNodes = 1 << (n - 1);
int mask = numNodes - 1;
vector<vector<int>> adj(numNodes);
for (int u = 0; u < numNodes; u++) {
    for (int c = 0; c < 2; c++) {
        int v = ((u << 1) | c) & mask;
        adj[u].push_back(v);
    }
}

Every node has out-degree exactly 2. Total edges: \(2^n\). This matches the number of \(n\)-length binary strings.


Euler Circuit Gives the Sequence

Run Hierholzer's algorithm starting from node 0. The circuit visits every edge exactly once. To extract the de Bruijn sequence, read the last bit of each node visited (except the final duplicate of the start node).

Actually, there's an even simpler extraction. As you traverse edge \(u \to v\) by appending bit \(c\), that bit \(c\) is the next character in the sequence.

string deBruijn(int n) {
    if (n == 1) return "01";
    int numNodes = 1 << (n - 1);
    int mask = numNodes - 1;
    vector<vector<int>> adj(numNodes);
    for (int u = 0; u < numNodes; u++) {
        for (int c = 0; c < 2; c++) {
            int v = ((u << 1) | c) & mask;
            adj[u].push_back(v);
        }
    }
    // Hierholzer
    vector<int> idx(numNodes, 0);
    stack<int> stk;
    vector<int> circuit;
    stk.push(0);
    while (!stk.empty()) {
        int u = stk.top();
        if (idx[u] < (int)adj[u].size()) {
            stk.push(adj[u][idx[u]++]);
        } else {
            circuit.push_back(u);
            stk.pop();
        }
    }
    reverse(circuit.begin(), circuit.end());
    // Build sequence: start with the first node, then append last bit of each subsequent node
    string result(n - 1, '0'); // first node is 00...0
    for (int i = 1; i < (int)circuit.size(); i++) {
        result += ('0' + (circuit[i] & 1));
    }
    return result;
}

Trace for n = 3

Nodes: 0 (="00"), 1 (="01"), 2 (="10"), 3 (="11").

Step Stack Top Node Edge Taken Next Node
1 [0] 00 append 0 00
2 [0, 0] 00 append 1 01
3 [0, 0, 1] 01 append 0 10
4 [0, 0, 1, 2] 10 append 0 00
5 [0, 0, 1, 2, 0] 00 no unused edges pop 0
6 [0, 0, 1, 2] 10 append 1 01
7 [0, 0, 1, 2, 1] 01 append 1 11
8 [0, 0, 1, 2, 1, 3] 11 append 0 10
9 [0, 0, 1, 2, 1, 3, 2] 10 no unused edges pop 2
10 [0, 0, 1, 2, 1, 3] 11 append 1 11
11 [0, 0, 1, 2, 1, 3, 3] 11 no unused edges pop 3
12-16 ... ... pop remaining ...

Circuit (reversed): 0 → 0 → 1 → 2 → 1 → 3 → 2 → 0 → 3.

Wait -- let me re-derive. The result list (before reverse) gets nodes popped in order: 0, 2, 3, 2, 1, 3, 1, 0, 0. Reversed: 0, 0, 1, 3, 1, 2, 3, 2, 0.

Extracting last bits of nodes 1 through 8: 0, 1, 1, 1, 0, 1, 0, 0. Prepend "00" (first node): sequence = "0001 1010 0" -- that's 00011010. Length 8 = \(2^3\). Contains all 3-bit strings as substrings.

Gotcha. The CSES problem asks for a linear de Bruijn sequence of length \(2^n + n - 1\). Append the first \(n - 1\) characters of the cyclic sequence to handle the wraparound windows.


CSES De Bruijn Sequence

Problem. Given \(n\), print the shortest bit string of length \(2^n + n - 1\) containing every binary string of length \(n\) as a substring.

The solution is exactly the algorithm above. Build the de Bruijn graph on \(2^{n-1}\) nodes, find the Euler circuit, extract the sequence, then append the first \(n - 1\) characters to make it linear.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    scanf("%d", &n);
    if (n == 1) {
        puts("01");
        return 0;
    }
    int numNodes = 1 << (n - 1);
    int mask = numNodes - 1;
    // Each node u has edges to (u<<1|0)&mask and (u<<1|1)&mask
    // Use Hierholzer with implicit adjacency (2 edges per node)
    vector<int> idx(numNodes, 0);
    stack<int> stk;
    vector<int> circuit;
    stk.push(0);
    while (!stk.empty()) {
        int u = stk.top();
        if (idx[u] < 2) {
            int c = idx[u]++;
            int v = ((u << 1) | c) & mask;
            stk.push(v);
        } else {
            circuit.push_back(u);
            stk.pop();
        }
    }
    reverse(circuit.begin(), circuit.end());
    // Build result
    string result(n - 1, '0');
    for (int i = 1; i < (int)circuit.size(); i++) {
        result += ('0' + (circuit[i] & 1));
    }
    // Append first n-1 chars for linear version
    result += result.substr(0, n - 1);
    printf("%s\n", result.c_str());
}

Time: \(O(2^n)\). Space: \(O(2^n)\). Both are dominated by the output size.


Why This Works: The Connection to Euler Circuits

Each edge in the de Bruijn graph corresponds to one unique \(n\)-length string. An Euler circuit visits every edge exactly once. Therefore, the circuit produces every \(n\)-length string exactly once.

The overlap property is automatic. Consecutive edges in the circuit share \(n - 2\) characters. The second edge's string is the first edge's string shifted left by 1, plus one new character. This is exactly how a sliding window works over the output string.

Key insight. De Bruijn sequences are not a separate topic. They're Euler circuits on a specific graph. Once you recognize the graph, the algorithm is identical to Lesson 1.


Problems

Problem Link Difficulty
CSES De Bruijn Sequence cses.fi/problemset/task/1692 Medium