Catalan Numbers
Pillar: Chain — "\(C_n = \sum C_k \times C_{n-1-k}\): split at the \(k\)-th return to zero. Each half is an independent sub-chain." Pillar: Shape — "The ballot problem: paths that never dip below zero form a restricted Shape — use the reflection principle."
The Setup
You want to count the number of distinct binary search trees you can build with \(n\) nodes. Try \(n = 3\): the keys are \(1, 2, 3\). If \(1\) is the root, the left subtree is empty and the right subtree holds \(\{2, 3\}\) — two BSTs there. If \(2\) is the root, the left holds \(\{1\}\) and the right holds \(\{3\}\) — one choice each, so one tree. If \(3\) is the root, the left holds \(\{1, 2\}\) and the right is empty — two BSTs on the left.
Total: \(2 + 1 + 2 = 5\).
Now try counting the number of ways to arrange three pairs of balanced parentheses: ((())), (()()), (())(), ()(()), ()()(). Five again.
Triangulations of a convex pentagon (a \((3+2)\)-gon)? Also five. Monotone lattice paths from \((0,0)\) to \((3,3)\) that stay below the diagonal? Five. This is not a coincidence. The number \(5\) is \(C_3\), the third Catalan number, and all these problems are the same problem wearing different costumes.
The Catalan Sequence
The first several Catalan numbers:
| \(n\) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| \(C_n\) | 1 | 1 | 2 | 5 | 14 | 42 | 132 | 429 |
The sequence grows roughly as \(\frac{4^n}{n^{3/2} \sqrt{\pi}}\). For competitive programming, the exact values overflow long long around \(n = 35\), so anything beyond small \(n\) requires modular arithmetic.
Closed-Form Formula
The Catalan number \(C_n\) has two equivalent closed-form expressions:
The first form divides \(\binom{2n}{n}\) by \(n+1\). The second subtracts the "bad" paths from the total (you'll see why in the reflection principle section below). The two are algebraically identical:
Verify for \(n = 3\): \(C_3 = \binom{6}{3}/4 = 20/4 = 5\). Or: \(\binom{6}{3} - \binom{6}{2} = 20 - 15 = 5\).
Recursive Formula
This IS the Chain pillar in action. The idea: take any valid structure of size \(n\), and identify the first "atomic unit." Everything before it forms a valid sub-structure of size \(k\), and everything after it forms a valid sub-structure of size \(n - 1 - k\). The two halves are independent, so their counts multiply. Sum over all possible split points \(k\).
Trace for \(n = 3\)
Each term corresponds to a split point:
| Split \(k\) | Left size | Right size | Count |
|---|---|---|---|
| \(k = 0\) | \(C_0 = 1\) | \(C_2 = 2\) | \(1 \times 2 = 2\) |
| \(k = 1\) | \(C_1 = 1\) | \(C_1 = 1\) | \(1 \times 1 = 1\) |
| \(k = 2\) | \(C_2 = 2\) | \(C_0 = 1\) | \(2 \times 1 = 2\) |
For BSTs with \(n = 3\) keys: if the root is the \((k+1)\)-th smallest element, the left subtree has \(k\) nodes and the right subtree has \(n - 1 - k\) nodes. The recurrence falls out directly.
For balanced parentheses: the first open parenthesis matches some close parenthesis. If that matching close is at position \(2k + 1\) (zero-indexed), then \(k\) pairs sit inside and \(n - 1 - k\) pairs sit to the right.
The Ten Interpretations
All of these are counted by \(C_n\). Each is a different encoding of the same combinatorial structure.
1. Balanced parentheses of length \(2n\).
Strings of \(n\) open and \(n\) close parentheses where every prefix has at least as many opens as closes. For \(n = 3\): ((())), (()()), (())(), ()(()), ()()(). The Chain decomposition: the first ( matches some ) at position \(2k+1\), creating \(k\) pairs inside and \(n-1-k\) pairs outside.
2. Distinct BSTs with \(n\) nodes. The number of structurally unique binary search trees storing keys \(1, 2, \ldots, n\). Choose a root \(r\): the left subtree is a BST on \(r - 1\) keys, the right on \(n - r\) keys. The split point is the root choice.
3. Triangulations of a convex \((n+2)\)-gon. The number of ways to cut a convex polygon with \(n + 2\) vertices into \(n\) triangles using non-crossing diagonals. Fix one edge of the polygon. The triangle containing that edge has its third vertex at some other vertex, splitting the remaining polygon into two smaller polygons. For \(n = 3\): a pentagon has \(5\) triangulations.
4. Monotone lattice paths from \((0,0)\) to \((n,n)\) staying below \(y = x\). Each step is either right \((+1, 0)\) or up \((0, +1)\), and the path never crosses strictly above the diagonal. These are sometimes called Dyck paths when drawn as up-steps and down-steps. The ballot problem (below) proves this count equals \(C_n\) using the reflection principle.
5. Stack-sortable permutations of \([n]\). Permutations of \(\{1, 2, \ldots, n\}\) that can be sorted by a single pass through a stack. Push elements onto the stack, pop to the output whenever the top equals the next needed element. The permutation \(231\) is the forbidden pattern — any permutation avoiding \(231\) as a subsequence is stack-sortable.
6. Non-crossing handshakes among \(2n\) people. \(2n\) people sit in a circle. Pair them up so that the \(n\) "handshake" chords don't cross. Pick any person — they shake hands with someone, dividing the remaining people into two arcs. Each arc must pair off independently. For \(n = 3\): six people, five non-crossing pairings.
7. Full binary trees with \(n + 1\) leaves. A full binary tree is one where every internal node has exactly two children. With \(n\) internal nodes, there are \(n + 1\) leaves. The number of distinct full binary tree shapes is \(C_n\). The split: the root's left subtree has \(k\) internal nodes, the right has \(n - 1 - k\).
8. Mountain ranges with \(n\) up-steps and \(n\) down-steps. Sequences of \(n\) ascents and \(n\) descents that start and end at ground level and never go underground. Identical to Dyck paths — just drawn differently. Each mountain range encodes a balanced parenthesization: up = open, down = close.
9. Ways to multiply \(n + 1\) factors (Associahedron). The number of ways to fully parenthesize a product \(a_1 \cdot a_2 \cdots a_{n+1}\) with \(n\) multiplication operations. For \(n = 3\): \(((a \cdot b) \cdot c) \cdot d\), \((a \cdot (b \cdot c)) \cdot d\), \((a \cdot b) \cdot (c \cdot d)\), \(a \cdot ((b \cdot c) \cdot d)\), \(a \cdot (b \cdot (c \cdot d))\). Five ways. The split: the last multiplication applied divides the factors into a left group and a right group.
10. Non-crossing partitions of \([n]\). The number of ways to partition \(\{1, 2, \ldots, n\}\) into blocks so that, when elements are arranged on a circle, no two blocks interleave. For \(n = 4\): there are \(14 = C_4\) non-crossing partitions. Two blocks \(\{a, c\}\) and \(\{b, d\}\) with \(a < b < c < d\) would be crossing — non-crossing partitions forbid this.
The key insight across all ten: each one has a natural decomposition into two independent sub-problems at a chosen split point, which produces the same recurrence \(C_n = \sum C_k \cdot C_{n-1-k}\).
The Ballot Problem and the Reflection Principle
This is the most important derivation technique for Catalan numbers. It proves the closed form and generalizes to many counting problems involving lattice paths with constraints.
The Problem
Count the number of monotone lattice paths from \((0, 0)\) to \((n, n)\) that never go strictly above the line \(y = x\). Each step is either right \((+1, 0)\) or up \((0, +1)\).
Think of it as a voting scenario: candidate A and candidate B each receive \(n\) votes. Votes arrive one at a time. A path staying below \(y = x\) means A is never behind — at every point, A has received at least as many votes as B.
Step 1: Total Paths
Any path from \((0, 0)\) to \((n, n)\) consists of \(n\) right-steps and \(n\) up-steps, in some order. The total number of such paths is:
Choose which \(n\) of the \(2n\) steps are right-steps; the rest are up-steps.
Step 2: Counting Bad Paths
A bad path is one that touches the line \(y = x + 1\) at some point — meaning at some step, the number of up-steps strictly exceeds the number of right-steps.
The reflection principle gives a bijection between bad paths and ALL paths from \((0, 0)\) to a different endpoint.
Take any bad path. Find the first point where it touches \(y = x + 1\). Reflect every step after that first touch across the line \(y = x + 1\): swap every right-step with an up-step and vice versa.
Before the reflection, the path ends at \((n, n)\). After the first touch point, the path had some number of right-steps and up-steps. Reflecting swaps them. The reflected path now ends at \((n - 1, n + 1)\) instead of \((n, n)\).
The critical observation: this reflection is a bijection. Every bad path maps to a unique path from \((0, 0)\) to \((n - 1, n + 1)\), and every path from \((0, 0)\) to \((n - 1, n + 1)\) maps back to a unique bad path. The mapping is its own inverse.
The number of paths from \((0, 0)\) to \((n - 1, n + 1)\) is:
This is the count of bad paths.
Step 3: Good Paths
Verify for \(n = 3\): \(\binom{6}{3} - \binom{6}{2} = 20 - 15 = 5\). Five paths that never cross above the diagonal.
Worked Example: \(n = 3\)
Total paths from \((0,0)\) to \((3,3)\): \(\binom{6}{3} = 20\).
Bad paths (those that touch \(y = x + 1\)): each bijects to a path from \((0,0)\) to \((2, 4)\). Count: \(\binom{6}{2} = 15\).
Good paths: \(20 - 15 = 5 = C_3\).
Trace one bad path concretely. Consider the path UURRRR — wait, that has 2 ups and 4 rights, wrong endpoint. Let's use the encoding where R means right-step and U means up-step.
Take a bad path to \((3,3)\): UURR UR. The positions visited are:
At \((0,1)\), we have \(y = 1\) and \(x = 0\), so \(y = x + 1\). This is the first touch of the forbidden line. Reflect the suffix after \((0,1)\): the remaining steps are URRUR. Swap R and U to get URRU R \(\to\) RUURR. Wait — swap each step: U \(\to\) R, R \(\to\) U, R \(\to\) U, U \(\to\) R, R \(\to\) U. The reflected suffix is RUURU. The reflected path is: U (to reach \((0,1)\)) then RUURU, ending at \((0 + 1 + 0 + 0 + 1 + 0, 1 + 0 + 1 + 1 + 0 + 1) = (2, 4)\).
The reflected path has 2 right-steps and 4 up-steps, landing at \((2, 4) = (n-1, n+1)\). Every bad path maps to exactly one path to \((2, 4)\), and vice versa. So bad paths number \(\binom{6}{2} = 15\), and good paths number \(20 - 15 = 5\).
Why Does the Bijection Work?
Every path to \((n - 1, n + 1)\) must cross \(y = x + 1\) at some point — you can't reach a point with more up-steps than right-steps without crossing the line at some step. So finding "the first touch" is always well-defined. Reflecting at that touch restores a path to \((n, n)\) that is bad. No path is missed, no path is double-counted.
This is the Shape pillar: the constraint that the path stays below a boundary (a geometric shape) reduces the count by exactly the number of reflected paths.
Generalization
The reflection principle extends beyond Catalan numbers. To count lattice paths from \((0, 0)\) to \((a, b)\) that never touch \(y = x + k\), reflect across \(y = x + k\). Bad paths biject to paths ending at \((b - k, a + k)\). This technique appears in problems involving ballot sequences with unequal vote counts, or paths with arbitrary forbidden boundaries.
Recognizing Catalan in the Wild
In a competition, you won't see "compute the \(n\)-th Catalan number." Instead, you'll see a counting problem and need to recognize the Catalan structure. The telltale signs:
- Balanced pairing. You're matching \(n\) things with \(n\) other things under some non-crossing or non-nesting constraint.
- Binary split. The structure decomposes by choosing a root/pivot, leaving two independent sub-structures whose sizes sum to \(n - 1\).
- Path constraint. You're counting lattice paths (or equivalent sequences) that stay on one side of a line.
- The sequence 1, 2, 5, 14, 42. If you compute small cases and see this, you're almost certainly looking at Catalan.
Once you recognize the pattern, pick whichever interpretation makes the problem easiest — often the parenthesization or BST interpretation gives the cleanest recurrence.
Computing \(C_n \bmod p\)
For large \(n\), the Catalan number overflows any integer type. Two approaches depending on the constraints.
Approach 1: DP (when \(n\) is small)
Use the recurrence directly. Build a table of \(C_0, C_1, \ldots, C_n\) bottom-up. Each \(C_i\) depends on all previous values. If you need the result under a mod, apply % mod inside the inner loop accumulation.
Time: \(O(n^2)\). Space: \(O(n)\). Practical for \(n \leq 5000\) or so. The quadratic cost comes from the fact that \(C_i\) sums over \(i\) terms — the total across all \(i\) from \(1\) to \(n\) is \(\sum_{i=1}^{n} i = O(n^2)\).
Approach 2: Closed-Form with Precomputed Factorials (when \(p\) is prime)
Use precomputed factorials and inverse factorials from Ch5 L2 to get \(\binom{2n}{n}\) in \(O(1)\) after \(O(n)\) precomputation. Then multiply by the modular inverse of \(n + 1\) (via Fermat's little theorem, since \(p\) is prime).
Time: \(O(n)\) precomputation, \(O(1)\) per query. Handles \(n\) up to \(10^6\) or more.
A Note on the Generating Function
The recurrence \(C_n = \sum_{k=0}^{n-1} C_k \cdot C_{n-1-k}\) is a convolution — it says the sequence \((C_n)\) satisfies \(C(x) = 1 + x \cdot C(x)^2\) where \(C(x) = \sum_{n=0}^{\infty} C_n x^n\) is the generating function. Solving the quadratic:
The positive root is discarded because it diverges at \(x = 0\). Expanding via the generalized binomial theorem recovers \(C_n = \binom{2n}{n}/(n+1)\). You don't need generating functions for competitions — but recognizing that the Catalan recurrence is a convolution can help you spot when a problem's DP is secretly computing Catalan numbers.
The Code
Catalan DP (Small \(n\))
long long catalanDP(int count) {
vector<long long> catalan(count + 1, 0);
catalan[0] = 1;
for (int size = 1; size <= count; size++) {
for (int leftSize = 0; leftSize < size; leftSize++) {
catalan[size] += catalan[leftSize] * catalan[size - 1 - leftSize];
}
}
return catalan[count];
}
\(O(n^2)\) time, \(O(n)\) space.
Each size is a sub-problem of the Chain. The inner loop sums over all split points: left sub-chain of size leftSize, right sub-chain of size size - 1 - leftSize. This directly implements \(C_n = \sum_{k=0}^{n-1} C_k \cdot C_{n-1-k}\).
Without modular arithmetic, this overflows long long around \(n = 35\). For larger values under a mod, add % mod to the inner accumulation:
This version handles \(n\) up to around \(5000\) comfortably within typical time limits.
Catalan Formula (Mod \(p\), using Precomputed Factorials from Ch5 L2)
long long catalanFormula(int count, long long mod) {
vector<long long> factorial(2 * count + 1);
vector<long long> inverseFactorial(2 * count + 1);
factorial[0] = 1;
for (int i = 1; i <= 2 * count; i++) {
factorial[i] = factorial[i - 1] * i % mod;
}
inverseFactorial[2 * count] = modpow(factorial[2 * count], mod - 2, mod);
for (int i = 2 * count - 1; i >= 0; i--) {
inverseFactorial[i] = inverseFactorial[i + 1] * (i + 1) % mod;
}
long long ncrTwoNChooseN = factorial[2 * count] % mod
* inverseFactorial[count] % mod
* inverseFactorial[count] % mod;
long long inverseDenominator = modpow(count + 1, mod - 2, mod);
return ncrTwoNChooseN * inverseDenominator % mod;
}
\(O(n)\) time, \(O(n)\) space.
The modpow function is from Ch4 L2. The factorial precomputation and inverse factorial trick are from Ch5 L2. The only new piece is the final multiplication by \(\text{modInverse}(n + 1, p)\).
If you're answering multiple queries for different \(n\) values, precompute factorials up to the maximum \(2n\) once and answer each query in \(O(1)\).
Edge case: when \(n = 0\), the function should return \(1\). The formula gives \(\binom{0}{0} / 1 = 1\). The code handles this correctly since factorial[0] = 1 and modpow(1, mod - 2, mod) = 1.
Common Mistakes
-
Off-by-one in the polygon interpretation. Triangulations of a convex polygon with \(m\) vertices is \(C_{m-2}\), not \(C_m\). A pentagon (\(m = 5\)) gives \(C_3 = 5\), not \(C_5 = 42\). The offset of \(2\) trips people up constantly.
-
Forgetting the base case \(C_0 = 1\). The empty structure counts as one valid configuration. If you initialize \(C_0 = 0\), the entire DP produces zeros. One empty BST, one empty parenthesization, one trivial path.
-
Integer overflow in the DP version. \(C_{35} \approx 3.1 \times 10^{18}\), which is near the
long longlimit. \(C_{36}\) overflows. If the problem asks for exact values beyond \(n = 35\), you must use modular arithmetic. If you're using the DP under a mod, apply% modinside the inner loop, not just at the end.
Quick Recap
- The Catalan number \(C_n\) counts structures that decompose into two independent sub-structures at a split point: BSTs, balanced parentheses, triangulations, lattice paths, and six more.
- Closed form: \(C_n = \binom{2n}{n}/(n+1) = \binom{2n}{n} - \binom{2n}{n-1}\).
- Recurrence: \(C_n = \sum_{k=0}^{n-1} C_k \cdot C_{n-1-k}\). This is the Chain pillar — split, solve independently, multiply, sum.
- Reflection principle: Bad lattice paths biject to paths ending at \((n-1, n+1)\). Subtracting \(\binom{2n}{n-1}\) from \(\binom{2n}{n}\) yields the Catalan number. This is the Shape pillar.
- Computation: DP in \(O(n^2)\) for small \(n\); closed form with modular inverse in \(O(n)\) for large \(n\) under a prime mod.
Problems
| Problem | Link | Difficulty | Focus |
|---|---|---|---|
| CSES — Bracket Sequences I | cses.fi/problemset/task/2064 | Medium | Direct Catalan: count balanced bracket strings of length \(2n\) |
| CSES — Bracket Sequences II | cses.fi/problemset/task/2187 | Medium-Hard | Catalan with a fixed prefix; reflection principle with an offset starting point |
| LC 96 — Unique Binary Search Trees | leetcode.com/problems/unique-binary-search-trees | Medium | The BST interpretation; implement the recurrence or closed form |
| LC 22 — Generate Parentheses | leetcode.com/problems/generate-parentheses | Medium | Enumerate (not just count) all \(C_n\) balanced parenthesizations via backtracking |