Catalan Numbers and Trees

The Number That Keeps Showing Up
You want to know how many structurally distinct binary search trees you can build with \(n\) keys. You try small cases by hand. For \(n = 3\), you draw all the BSTs and count 5. For \(n = 4\), you get 14. For \(n = 5\), you get 42.
These numbers — 1, 1, 2, 5, 14, 42, 132, 429 — are the Catalan numbers, and they appear in an absurd number of combinatorial problems. Once you understand why they count BSTs, you'll recognize them everywhere.
The BST Recurrence
Suppose you have keys \(1, 2, \ldots, n\) and you want to build a BST. You must choose a root. If you pick key \(k\) as the root:
- Keys \(1, 2, \ldots, k-1\) go into the left subtree. That's \(k - 1\) keys.
- Keys \(k+1, k+2, \ldots, n\) go into the right subtree. That's \(n - k\) keys.
The left and right subtrees are independent — every valid left can pair with every valid right. So the number of BSTs with root \(k\) is:
Summing over all choices of root:
with \(C(0) = 1\) (the empty tree is one valid structure).
This recurrence IS the Catalan recurrence. It's not an analogy or a coincidence. The Catalan numbers are defined by this exact equation.
Computing the First 8 Catalan Numbers
| \(n\) | Expansion | \(C(n)\) |
|---|---|---|
| 0 | (base case) | 1 |
| 1 | \(C(0) \cdot C(0)\) | 1 |
| 2 | \(C(0) \cdot C(1) + C(1) \cdot C(0) = 1 + 1\) | 2 |
| 3 | \(C(0)C(2) + C(1)C(1) + C(2)C(0) = 2 + 1 + 2\) | 5 |
| 4 | \(C(0)C(3) + C(1)C(2) + C(2)C(1) + C(3)C(0) = 5+2+2+5\) | 14 |
| 5 | \(C(0)C(4) + C(1)C(3) + C(2)C(2) + C(3)C(1) + C(4)C(0)\) | 42 |
| 6 | (6 terms) | 132 |
| 7 | (7 terms) | 429 |
Notice the symmetry: \(C(k-1) \cdot C(n-k)\) reads the same forwards and backwards. Choosing root \(k=1\) (everything goes right) gives the same count as choosing root \(k=n\) (everything goes left).
The Closed Form
The Catalan numbers have a beautiful closed-form:
You can verify: \(C(3) = \frac{1}{4}\binom{6}{3} = \frac{20}{4} = 5\). Checks out.
This formula is derived from the reflection principle on lattice paths, but for problem-solving, the DP recurrence is what you'll actually code.
LC 96: Unique Binary Search Trees
Problem: Given integer \(n\), return the number of structurally unique BSTs that store values \(1\) through \(n\).
This is the Catalan recurrence directly.
int countUniqueBSTs(int totalKeys) {
vector<long long> numTrees(totalKeys + 1, 0);
numTrees[0] = 1;
for (int nodes = 1; nodes <= totalKeys; nodes++) {
for (int root = 1; root <= nodes; root++) {
numTrees[nodes] += numTrees[root - 1] * numTrees[nodes - root];
}
}
return numTrees[totalKeys];
}
Complexity: \(O(n^2)\) time, \(O(n)\) space.
LC 95: Unique Binary Search Trees II
Problem: Given integer \(n\), generate all structurally unique BSTs that store values \(1\) through \(n\).
The DP counted them. Now you need to construct them. The approach: for each root choice, recursively build all left subtrees and all right subtrees, then combine every pair.
vector<TreeNode*> buildAllTrees(int lo, int hi) {
if (lo > hi) return {nullptr};
vector<TreeNode*> allTrees;
for (int rootVal = lo; rootVal <= hi; rootVal++) {
vector<TreeNode*> leftOptions = buildAllTrees(lo, rootVal - 1);
vector<TreeNode*> rightOptions = buildAllTrees(rootVal + 1, hi);
for (TreeNode* leftRoot : leftOptions) {
for (TreeNode* rightRoot : rightOptions) {
allTrees.push_back(new TreeNode(rootVal, leftRoot, rightRoot));
}
}
}
return allTrees;
}
vector<TreeNode*> generateTrees(int totalKeys) {
if (totalKeys == 0) return {};
return buildAllTrees(1, totalKeys);
}
For \(n = 3\), this returns 5 trees — exactly \(C(3)\).
What Else Catalan Counts
The Catalan number \(C(n)\) also counts:
- Valid parenthesizations: the number of ways to arrange \(n\) pairs of parentheses so they're balanced. \(C(3) = 5\):
((())),(()()),(())(),()(()),()()(). - Dyck paths: lattice paths from \((0,0)\) to \((2n,0)\) using up-steps \((+1)\) and down-steps \((-1)\) that never go below zero.
- Polygon triangulations: the number of ways to triangulate a convex polygon with \(n+2\) sides.
- Full binary trees: the number of full binary trees with \(n+1\) leaves.
These are all the same number because there are bijections between them.
The Bijection: BSTs and Dyck Paths
Here's the connection between binary trees and Dyck paths. Take any binary tree with \(n\) nodes and do a preorder traversal. At each node:
- Going left (descending to a left child) → write an up step
- Going right (descending to a right child) → write a down step
- Hitting
nullptr→ write nothing (the step was already recorded)
More precisely: encode the tree's structure as a sequence of \(2n\) steps by writing "(" when you visit a node for the first time and ")" when you finish its subtree. This gives a balanced parenthesization — which is exactly a Dyck path.
If you're solving a counting problem and the answer turns out to be a Catalan number, look for the hidden binary tree. There's almost certainly a bijection that maps your objects to binary trees, and that bijection often reveals a recursive structure you can exploit.
Try It
The starter code has a countUniqueBSTs function. Fill it in using the DP recurrence.
Predict before running: What does countUniqueBSTs(0) return? It should be 1 — there is exactly one BST with zero keys (the empty tree).
Challenge: Modify the code to use the closed-form \(C(n) = \frac{1}{n+1}\binom{2n}{n}\) instead of the DP. Be careful with integer overflow — you'll need to compute the binomial coefficient incrementally.
Challenge 2: In LC 95, how many tree nodes are allocated in total across all \(C(n)\) trees for \(n = 4\)? Think about it before running: each tree has 4 nodes, and there are 14 trees, but many trees share subtree pointers.
Edge Cases to Watch For
- Large \(n\) overflow: \(C(n)\) grows exponentially. \(C(19) = 1767263190\) fits in a 32-bit integer, but \(C(20) = 6564120420\) does not. Use
long longfor \(n \geq 20\). - Closed-form intermediate overflow: The closed-form \(\frac{1}{n+1}\binom{2n}{n}\) computes \(\binom{2n}{n}\) first, which is much larger than \(C(n)\). For \(n = 30\), \(\binom{60}{30} \approx 1.2 \times 10^{17}\) — dividing by \(31\) gives the right answer, but the intermediate overflows
int. Compute withlong longor use the DP recurrence.
Problems
| Problem | Link | Difficulty |
|---|---|---|
| LC 96 — Unique Binary Search Trees | leetcode.com/problems/unique-binary-search-trees | Medium |
| LC 95 — Unique Binary Search Trees II | leetcode.com/problems/unique-binary-search-trees-ii | Medium |
| LC 22 — Generate Parentheses | leetcode.com/problems/generate-parentheses | Medium |