Counting Orderings

All valid topological orderings of a small DAG

A Deceptively Hard Question

How many valid topological orderings does a DAG have? For a chain 1 → 2 → 3, the answer is 1. For three disconnected nodes, it's 3! = 6. For anything in between, the answer depends on the exact structure of the DAG.

This problem is #P-hard in general -- meaning there's no known polynomial-time algorithm. But when N is small (say, N ≤ 20), bitmask DP solves it in O(2^N * N).

The Bitmask DP Idea

Think of building the topological ordering one node at a time, left to right. At each step, you've placed some subset of nodes. The next node you place must have all its predecessors already placed.

State: A bitmask mask representing which nodes have been placed so far.

Transition: For each node i not yet in mask, if all predecessors of i are in mask, then i is a valid next choice.

Recurrence:

dp[mask | (1 << i)] += dp[mask]

for each valid i.

Base: dp[0] = 1 (empty arrangement, one way to have placed nothing).

Answer: dp[(1 << N) - 1] (all nodes placed).

💡 Key insight. This is the same bitmask DP pattern used for TSP and Hamiltonian paths. The "which items have been placed" bitmask is one of the most versatile DP states in competitive programming.

Predecessor Bitmask

To check "all predecessors of i are in mask" efficiently, precompute a bitmask pred[i] where bit j is set if j must come before i.

int pred[20] = {};
for (int i = 0; i < m; i++) {
    int u, v;
    cin >> u >> v;
    u--; v--; // 0-indexed
    pred[v] |= (1 << u);
}

Now the check becomes: (pred[i] & mask) == pred[i].

This is a subset check. It asks: "Is pred[i] a subset of mask?" If yes, every predecessor of i has been placed.

The Full Algorithm

long long dp[1 << 20];
memset(dp, 0, sizeof(dp));
dp[0] = 1;

for (int mask = 0; mask < (1 << n); mask++) {
    if (dp[mask] == 0) continue;
    for (int i = 0; i < n; i++) {
        if (mask & (1 << i)) continue;       // i already placed
        if ((pred[i] & mask) == pred[i])      // all preds placed
            dp[mask | (1 << i)] += dp[mask];
    }
}

cout << dp[(1 << n) - 1] << endl;

Trace on a Small Example

N = 3, constraints: 0 → 1, 0 → 2 (node 0 must come before both 1 and 2).

Predecessors: pred[0] = 000, pred[1] = 001, pred[2] = 001

mask (binary)	mask (set)	dp[mask]	Valid next nodes	Transitions
000	{}	1	0 (pred=000 ⊆ {})	dp[001] += 1
001	{0}	1	1 (pred=001 ⊆ {0}), 2 (pred=001 ⊆ {0})	dp[011] += 1, dp[101] += 1
010	{1}	0	--	skip
011	{0,1}	1	2 (pred=001 ⊆ {0,1})	dp[111] += 1
100	{2}	0	--	skip
101	{0,2}	1	1 (pred=001 ⊆ {0,2})	dp[111] += 1
110	{1,2}	0	--	skip
111	{0,1,2}	2	--	done

Answer: 2 orderings. They are: [0, 1, 2] and [0, 2, 1].

Makes sense: 0 must go first, then 1 and 2 can go in either order.

AtCoder abc041_d

Problem: N items (N ≤ 16) with M precedence constraints. Count valid orderings.

This is the exact problem our bitmask DP solves:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n, m;
    cin >> n >> m;
    int pred[16] = {};
    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        u--; v--;
        pred[v] |= (1 << u);
    }

    vector<long long> dp(1 << n, 0);
    dp[0] = 1;
    for (int mask = 0; mask < (1 << n); mask++) {
        if (dp[mask] == 0) continue;
        for (int i = 0; i < n; i++) {
            if (mask & (1 << i)) continue;
            if ((pred[i] & mask) == pred[i])
                dp[mask | (1 << i)] += dp[mask];
        }
    }

    cout << dp[(1 << n) - 1] << endl;
}

For N = 16, we have 2^16 * 16 = 1,048,576 transitions. Very fast.

Why This Works But Polynomial Algorithms Don't

The number of topological orderings can be exponential. A DAG with no edges on N nodes has N! orderings. Any algorithm that counts them must at least keep track of which subsets have been "completed," and there are 2^N subsets.

The bitmask DP is optimal in the sense that it visits each (mask, node) pair at most once. You can't do much better without exploiting special DAG structure.

⚠ Gotcha. For N > 20, this approach is infeasible. 2^20 * 20 is about 20 million -- fine. 2^25 * 25 is about 800 million -- too slow. For large N, counting topological orderings requires specialized techniques (or the problem has extra structure to exploit).

Larger Trace (N = 4)

N = 4, constraints: 0 → 2, 1 → 3

Predecessors: pred[0] = 0000, pred[1] = 0000, pred[2] = 0001, pred[3] = 0010

mask	Set	dp	Valid next	Key transitions
0000	{}	1	0, 1	dp[0001]+=1, dp[0010]+=1
0001	{0}	1	1, 2	dp[0011]+=1, dp[0101]+=1
0010	{1}	1	0, 3	dp[0011]+=1, dp[1010]+=1
0011	{0,1}	2	2, 3	dp[0111]+=2, dp[1011]+=2
0101	{0,2}	1	1	dp[0111]+=1
1010	{1,3}	1	0	dp[1011]+=1
0111	{0,1,2}	3	3	dp[1111]+=3
1011	{0,1,3}	3	2	dp[1111]+=3
1111	all	6	--	done

(Masks not shown have dp = 0.)

Answer: 6. The constraints 0→2 and 1→3 are weak. Without them, 4! = 24. Each constraint roughly halves the count.

Connection to the DP Course

If you've studied bitmask DP before (e.g., TSP), the pattern is identical:

Problem	State	Transition condition
TSP	mask = visited cities	Last city has edge to next city
Topo orderings	mask = placed nodes	All predecessors of next node in mask
Assignment problem	mask = assigned tasks	Task i available for person

The bitmask represents "what's been done." The inner loop asks "what can be done next." This pattern appears everywhere in competitive programming when N ≤ 20.

Try It

The starter code has the structure. Fill in the DP loop.

Test case 1 -- chain:

Input:
3 2
1 2
2 3

Output:
1

Test case 2 -- no constraints:

Input:
3 0

Output:
6

Test case 3 -- partial constraints:

Input:
4 2
1 3
2 4

Output:
6

Challenge: Modify the code to also print all valid topological orderings, not just count them. Hint: backtrack from dp[(1<<n)-1], choosing which node was placed last.

Problems

Problem	Link	Difficulty
AtCoder abc041_d — 切り絵	atcoder.jp/contests/abc041/tasks/abc041_d	Medium
LC 1857 — Largest Color Value in a Directed Graph	leetcode.com/problems/largest-color-value-in-a-directed-graph	Hard