Skip to content

Catalan Numbers and Trees

All 5 structurally unique BSTs for n=3, illustrating the Catalan number C(3)=5

The Number That Keeps Showing Up

You want to know how many structurally distinct binary search trees you can build with \(n\) keys. You try small cases by hand. For \(n = 3\), you draw all the BSTs and count 5. For \(n = 4\), you get 14. For \(n = 5\), you get 42.

These numbers — 1, 1, 2, 5, 14, 42, 132, 429 — are the Catalan numbers, and they appear in an absurd number of combinatorial problems. Once you understand why they count BSTs, you'll recognize them everywhere.


The BST Recurrence

Suppose you have keys \(1, 2, \ldots, n\) and you want to build a BST. You must choose a root. If you pick key \(k\) as the root:

  • Keys \(1, 2, \ldots, k-1\) go into the left subtree. That's \(k - 1\) keys.
  • Keys \(k+1, k+2, \ldots, n\) go into the right subtree. That's \(n - k\) keys.

The left and right subtrees are independent — every valid left can pair with every valid right. So the number of BSTs with root \(k\) is:

\[\text{(BSTs from } k-1 \text{ keys)} \times \text{(BSTs from } n-k \text{ keys)}\]

Summing over all choices of root:

\[C(n) = \sum_{k=1}^{n} C(k-1) \cdot C(n-k)\]

with \(C(0) = 1\) (the empty tree is one valid structure).

This recurrence IS the Catalan recurrence. It's not an analogy or a coincidence. The Catalan numbers are defined by this exact equation.


Computing the First 8 Catalan Numbers

\(n\) Expansion \(C(n)\)
0 (base case) 1
1 \(C(0) \cdot C(0)\) 1
2 \(C(0) \cdot C(1) + C(1) \cdot C(0) = 1 + 1\) 2
3 \(C(0)C(2) + C(1)C(1) + C(2)C(0) = 2 + 1 + 2\) 5
4 \(C(0)C(3) + C(1)C(2) + C(2)C(1) + C(3)C(0) = 5+2+2+5\) 14
5 \(C(0)C(4) + C(1)C(3) + C(2)C(2) + C(3)C(1) + C(4)C(0)\) 42
6 (6 terms) 132
7 (7 terms) 429

Notice the symmetry: \(C(k-1) \cdot C(n-k)\) reads the same forwards and backwards. Choosing root \(k=1\) (everything goes right) gives the same count as choosing root \(k=n\) (everything goes left).


The Closed Form

The Catalan numbers have a beautiful closed-form:

\[C(n) = \frac{1}{n+1}\binom{2n}{n}\]

You can verify: \(C(3) = \frac{1}{4}\binom{6}{3} = \frac{20}{4} = 5\). Checks out.

This formula is derived from the reflection principle on lattice paths, but for problem-solving, the DP recurrence is what you'll actually code.


LC 96: Unique Binary Search Trees

Problem: Given integer \(n\), return the number of structurally unique BSTs that store values \(1\) through \(n\).

This is the Catalan recurrence directly.

int countUniqueBSTs(int totalKeys) {
    vector<long long> numTrees(totalKeys + 1, 0);
    numTrees[0] = 1;

    for (int nodes = 1; nodes <= totalKeys; nodes++) {
        for (int root = 1; root <= nodes; root++) {
            numTrees[nodes] += numTrees[root - 1] * numTrees[nodes - root];
        }
    }

    return numTrees[totalKeys];
}

Complexity: \(O(n^2)\) time, \(O(n)\) space.


LC 95: Unique Binary Search Trees II

Problem: Given integer \(n\), generate all structurally unique BSTs that store values \(1\) through \(n\).

The DP counted them. Now you need to construct them. The approach: for each root choice, recursively build all left subtrees and all right subtrees, then combine every pair.

vector<TreeNode*> buildAllTrees(int lo, int hi) {
    if (lo > hi) return {nullptr};

    vector<TreeNode*> allTrees;

    for (int rootVal = lo; rootVal <= hi; rootVal++) {
        vector<TreeNode*> leftOptions = buildAllTrees(lo, rootVal - 1);
        vector<TreeNode*> rightOptions = buildAllTrees(rootVal + 1, hi);

        for (TreeNode* leftRoot : leftOptions) {
            for (TreeNode* rightRoot : rightOptions) {
                allTrees.push_back(new TreeNode(rootVal, leftRoot, rightRoot));
            }
        }
    }

    return allTrees;
}

vector<TreeNode*> generateTrees(int totalKeys) {
    if (totalKeys == 0) return {};
    return buildAllTrees(1, totalKeys);
}

For \(n = 3\), this returns 5 trees — exactly \(C(3)\).


What Else Catalan Counts

The Catalan number \(C(n)\) also counts:

  • Valid parenthesizations: the number of ways to arrange \(n\) pairs of parentheses so they're balanced. \(C(3) = 5\): ((())), (()()), (())(), ()(()), ()()().
  • Dyck paths: lattice paths from \((0,0)\) to \((2n,0)\) using up-steps \((+1)\) and down-steps \((-1)\) that never go below zero.
  • Polygon triangulations: the number of ways to triangulate a convex polygon with \(n+2\) sides.
  • Full binary trees: the number of full binary trees with \(n+1\) leaves.

These are all the same number because there are bijections between them.


The Bijection: BSTs and Dyck Paths

Here's the connection between binary trees and Dyck paths. Take any binary tree with \(n\) nodes and do a preorder traversal. At each node:

  • Going left (descending to a left child) → write an up step
  • Going right (descending to a right child) → write a down step
  • Hitting nullptr → write nothing (the step was already recorded)

More precisely: encode the tree's structure as a sequence of \(2n\) steps by writing "(" when you visit a node for the first time and ")" when you finish its subtree. This gives a balanced parenthesization — which is exactly a Dyck path.

If you're solving a counting problem and the answer turns out to be a Catalan number, look for the hidden binary tree. There's almost certainly a bijection that maps your objects to binary trees, and that bijection often reveals a recursive structure you can exploit.


Try It

The starter code has a countUniqueBSTs function. Fill it in using the DP recurrence.

Input: 3
Output: 5

Input: 5
Output: 42

Predict before running: What does countUniqueBSTs(0) return? It should be 1 — there is exactly one BST with zero keys (the empty tree).

Challenge: Modify the code to use the closed-form \(C(n) = \frac{1}{n+1}\binom{2n}{n}\) instead of the DP. Be careful with integer overflow — you'll need to compute the binomial coefficient incrementally.

Challenge 2: In LC 95, how many tree nodes are allocated in total across all \(C(n)\) trees for \(n = 4\)? Think about it before running: each tree has 4 nodes, and there are 14 trees, but many trees share subtree pointers.

Edge Cases to Watch For

  • Large \(n\) overflow: \(C(n)\) grows exponentially. \(C(19) = 1767263190\) fits in a 32-bit integer, but \(C(20) = 6564120420\) does not. Use long long for \(n \geq 20\).
  • Closed-form intermediate overflow: The closed-form \(\frac{1}{n+1}\binom{2n}{n}\) computes \(\binom{2n}{n}\) first, which is much larger than \(C(n)\). For \(n = 30\), \(\binom{60}{30} \approx 1.2 \times 10^{17}\) — dividing by \(31\) gives the right answer, but the intermediate overflows int. Compute with long long or use the DP recurrence.

Problems

Problem Link Difficulty
LC 96 — Unique Binary Search Trees leetcode.com/problems/unique-binary-search-trees Medium
LC 95 — Unique Binary Search Trees II leetcode.com/problems/unique-binary-search-trees-ii Medium
LC 22 — Generate Parentheses leetcode.com/problems/generate-parentheses Medium