Chains in DP and Greedy

The first two pages of this series covered logical chains (implications that cascade) and structural chains (orderings forced by data). This page covers the remaining two types: transition chains and dependency chains.

These are the chains that power dynamic programming and greedy algorithms. And — here's the punchline — DP and greedy aren't fundamentally different techniques. They're both about building chains of decisions. The difference is how aggressive you are about pruning.

Transition Chains: DP as Chain-Building

A transition chain is a sequence of states where each state is computed from previous states. That's it. That's what DP is.

\[dp[i] = f(dp[i-1], dp[i-2], \ldots)\]

The arrow goes from \(dp[i-1]\) to \(dp[i]\). The chain is the sequence of DP states, and the transition function is the rule for forging the next link.

What You're Really Storing

When you write dp[i], you're not storing the full solution up to index \(i\). You're storing a summary — just enough information to compute the next state. The Defining States page covered this: a DP state is the minimal sufficient summary of your container.

In chain terms: each link doesn't carry the full history. It carries the delta — the change from the previous link — plus whatever compressed information the next link needs.

For example, in the prefix sum \(P[i] = P[i-1] + A[i]\), each link stores a cumulative sum. You don't need the original array values anymore. The chain compresses the information.

In knapsack, dp[w] stores the best profit for weight \(w\). You don't know which items were picked. The chain discards that detail because it's irrelevant to future transitions.

The O(1) Transition Principle

Most DP transitions involve \(O(1)\) previous states. Think about it:

Fibonacci: \(F(n) = F(n-1) + F(n-2)\). Two previous states.
Knapsack: \(dp[i][w] = \max(dp[i-1][w], \, dp[i-1][w - w_i] + v_i)\). Two lookups into the previous row.
LIS (basic): \(dp[i] = \max_{j < i, A[j] < A[i]}(dp[j] + 1)\). This looks like \(O(N)\) per state, but can be optimized to \(O(\log N)\) with the tails array.

Why is \(O(1)\) so common? Because of a principle the session called the passthrough property: to go from state \(i\) to state \(i+2\), you can always pass through state \(i+1\). You never need to "skip" states. So each state only needs its immediate predecessor(s).

When a transition does depend on many previous states (like the basic LIS recurrence), that's a signal to look for optimization — a data structure (segment tree, monotonic queue) or a smarter state representation (the tails array) that reduces the per-transition work.

Transition Chains on Intervals

Not all DP transitions go from \(i\) to \(i+1\). Some go from shorter intervals to longer ones.

In interval DP, the state is a range \([L, R]\), and the transition tries every possible split point:

\[dp[L][R] = \min_{L \le k < R} \big( dp[L][k] + dp[k+1][R] + \text{cost}(L, R) \big)\]

The chain goes from length 1 → length 2 → length 3 → ... → length \(N\). Each length depends on all shorter lengths.

You process lengths in increasing order. By the time you compute \(dp[L][R]\) for length 5, all intervals of length 1 through 4 are already done. That ordering is itself a chain — a dependency chain.

Dependency Chains: What Must Come First

A dependency chain answers the question: in what order should I process things so that every prerequisite is ready?

This is topological sorting. If state B depends on state A, process A first. If multiple states are independent, process them in any order. The dependency chain gives you a valid processing sequence.

DP Ordering Is Dependency Resolution

Every DP problem has an implicit dependency graph. The states are nodes. If \(dp[i]\) depends on \(dp[j]\), there's an edge from \(j\) to \(i\).

In 1D DP (\(dp[i]\) depends on \(dp[i-1]\)): the dependency chain is just \(0 \to 1 \to 2 \to \ldots \to N\). Process left to right.
In 2D DP (\(dp[i][j]\) depends on \(dp[i-1][j]\) and \(dp[i][j-1]\)): process row by row, left to right within each row.
In interval DP: process by increasing interval length.
In bitmask DP (\(dp[\text{mask}]\) depends on \(dp[\text{subset of mask}]\)): process masks from 0 to \(2^N - 1\). Subsets always have smaller bitmask values.
In tree DP: process leaves first, then their parents. That's post-order traversal.

In every case, the processing order is just the topological order of the dependency graph. Bottom-up DP processes states in topological order explicitly. Top-down DP (memoized recursion) discovers the order implicitly — it follows the chain of dependencies recursively and caches results so each state is computed only once.

They're the same dependency resolution, just traversed differently.

When Dependencies Aren't a Simple Chain

Sometimes the dependency graph is more complex than a linear chain. In DP on a DAG, the states form an arbitrary directed acyclic graph. You can't just iterate left to right — you need a genuine topological sort.

But the principle is the same: find an order where every state's prerequisites are computed before the state itself. That's all dependency resolution is.

Greedy as Chain-Building with Pruning

Here's the connection that the session emphasized: greedy algorithms are just DP where you prune all but one path.

In DP, you explore every possible chain and keep the best answer. In greedy, you explore only the chain that looks best at each step. You're betting that the locally optimal choice leads to the globally optimal chain.

The DP Perspective

Think of a DP table as a tree of chains. From the starting state, you branch into multiple choices. Each choice leads to a new state, which branches again. The full tree has exponentially many paths. DP avoids the exponential blowup by merging paths that reach the same state (memoization).

Greedy goes further: it doesn't just merge paths, it prunes them. At each step, it picks one choice and throws away the rest. The resulting "tree" is just a single chain — one path from start to finish.

When Pruning Is Safe

Greedy works when the locally optimal choice is guaranteed to be part of a globally optimal solution. This usually requires a structural property:

Exchange argument: If you can show that swapping any non-greedy choice for the greedy choice doesn't make things worse, the greedy works.
Matroid structure: If the feasible sets form a matroid, a greedy algorithm that always picks the highest-weight feasible element is optimal.
Concavity / diminishing returns: If each additional unit of effort gives less benefit, taking the most productive option first is optimal.

Without these properties, pruning is dangerous. You might cut off the path that leads to the true optimum.

Dijkstra's Algorithm: Dependency Chaining, Not "Graph Theory"

Here's an example that ties everything together. Dijkstra's algorithm finds shortest paths from a source node. The standard explanation is: "it's a graph algorithm that uses a priority queue."

But think about it in terms of chains:

Dependency chain: You can't finalize node B's distance until all nodes that might provide a shorter path to B are finalized. The processing order is a dependency chain — process nodes in order of increasing distance.
Greedy pruning: When you extract the minimum-distance node from the priority queue, you're making a greedy choice: "this node's distance is final." Why? Because all unprocessed nodes have equal or greater distance. No future path through them could improve the current node's distance.
Transition chain: Each node's distance is computed from its predecessor's distance plus the edge weight. That's a transition: \(d[v] = d[u] + w(u, v)\).

Dijkstra's is simultaneously a dependency chain (process in topological order of distance), a greedy algorithm (always pick the closest unfinalized node), and a transition chain (each distance is computed from the previous one). It's all three chain types working together.

The session's point: don't think of Dijkstra as a "graph topic." Think of it as a dependency resolution topic that happens to operate on a graph.

The Unified View

Here's the big picture from the session:

Concept	Chain Perspective
DP	Build all possible chains, keep the best
Greedy	Build one chain, pruning the rest
BFS	Build chains layer by layer (distance 0, then 1, then 2...)
Dijkstra	Build chains in order of shortest distance
Topological sort	Find a valid order for a dependency chain
Monotonic stack	Maintain a structural chain, handle disturbances
Two pointers	Two ends of a logical chain moving in sync

These aren't separate algorithms you memorize from different textbook chapters. They're all chain-construction with different rules for which chains to keep and in what order to build them.

The practical benefit: when you see a new problem, instead of asking "is this a DP problem or a greedy problem or a graph problem?", ask:

What are the links in the chain? (What depends on what?)
Can I build the chain in \(O(1)\) per link? (Or do I need a data structure?)
Can I prune to a single chain? (Greedy.) Or do I need to explore all chains? (DP.)
What order should I build links? (Dependency resolution.)

Those four questions, combined with the logical and structural chain types from the previous pages, give you a complete framework for approaching any algorithmic problem.

Wrapping Up: The Chain Framework

Page	What It Covers
Thinking in Chains	The four chain types. Logical chains: implications, prefix sums, difference constraints
Structural Chains	Monotonic stacks, chain disturbance, Cartesian trees, functional graphs
Chains in DP and Greedy	Transition chains (\(O(1)\) DP logic), dependency resolution, greedy as pruning, the unified view