Skip to content

Expected Value DP

Pillar: Chain — "The expected cost from state \(s\) depends on expected costs of successor states — a Chain recurrence."


When Distributions Get Too Large

In Ch7 L1 you tracked full probability distributions: the probability of every possible sum, every possible position. That works when the state space is small. But consider a random walk on a line of length \(1000\), running for \(10^6\) steps. Tracking the full distribution over positions at each step is expensive — and often unnecessary.

If all you need is the expected number of steps to reach some target, you don't need the full distribution. You need one number per state: the expected cost from that state. This is expected value DP.

The shift is simple but powerful. Instead of computing "what's the probability of being at position \(k\) after \(t\) steps?", you compute "starting at position \(k\), what's the expected number of steps until I'm done?"


The EV Recurrence

Here's the general pattern. You have a set of states, and from each state \(s\) you transition to some next state with known probabilities. Each transition has an immediate cost (often just \(1\) for "one step"). You want:

\[E[s] = \text{expected total cost starting from state } s\]

The recurrence is:

\[E[s] = \sum_{\text{transitions from } s} P(\text{transition}) \times \bigl(\text{immediate cost} + E[\text{next state}]\bigr)\]

For absorbing states (terminal states where you stop), \(E[\text{absorbing}] = 0\).

This looks like any DP recurrence, and it is one. The key difference from typical DP: the transitions might form cycles, not a DAG. A random walk can go left, then right, then left again. That means you can't always evaluate the recurrence bottom-up. Sometimes you get a system of linear equations instead.


Example: Random Walk Absorption

Start at position \(k\) on a line \([0, n]\). Each step: move left with probability \(1/2\), move right with probability \(1/2\). What's the expected number of steps to reach \(0\) or \(n\)?

Define \(E[k]\) = expected steps to reach an endpoint starting from position \(k\).

Boundary conditions: \(E[0] = 0\), \(E[n] = 0\) (you're already at an endpoint).

Recurrence for \(1 \leq k \leq n-1\):

\[E[k] = 1 + \frac{1}{2} E[k-1] + \frac{1}{2} E[k+1]\]

The \(1\) is the cost of one step. With probability \(1/2\) you move to \(k-1\), with probability \(1/2\) to \(k+1\).

This is cyclic: \(E[k]\) depends on both \(E[k-1]\) and \(E[k+1]\), so you can't compute it in a single forward or backward pass. It's a system of \(n-1\) linear equations in \(n-1\) unknowns.

Solving the System

Rearrange the recurrence:

\[E[k] - \frac{1}{2} E[k-1] - \frac{1}{2} E[k+1] = 1\]

Multiply by \(2\):

\[2E[k] - E[k-1] - E[k+1] = 2\]

Rearrange as:

\[E[k+1] - E[k] = (E[k] - E[k-1]) - 2\]

Let \(D[k] = E[k+1] - E[k]\). Then \(D[k] = D[k-1] - 2\). This is an arithmetic sequence with common difference \(-2\):

\[D[k] = D[0] - 2k\]

where \(D[0] = E[1] - E[0] = E[1]\).

Telescoping: \(E[n] - E[0] = \sum_{k=0}^{n-1} D[k] = \sum_{k=0}^{n-1} (D[0] - 2k) = nD[0] - 2 \cdot \frac{n(n-1)}{2}\).

Since \(E[n] = E[0] = 0\):

\[0 = nD[0] - n(n-1) \implies D[0] = n - 1\]

Now reconstruct \(E[k]\). Since \(D[j] = (n-1) - 2j\):

\[E[k] = E[0] + \sum_{j=0}^{k-1} D[j] = \sum_{j=0}^{k-1} \bigl((n-1) - 2j\bigr) = k(n-1) - k(k-1)\]
\[\boxed{E[k] = k(n - k)}\]

Trace for \(n = 4\):

Position \(k\) \(E[k] = k(4 - k)\)
\(0\) \(0\)
\(1\) \(3\)
\(2\) \(4\)
\(3\) \(3\)
\(4\) \(0\)

Sanity check for \(k = 1\): \(E[1] = 1 + E[0]/2 + E[2]/2 = 1 + 0 + 2 = 3\). Correct.

The closed form \(k(n-k)\) is elegant, but what matters is the technique: set up the EV recurrence, recognize it's a linear system, solve it.


Example: Coupon Collector via DP

There are \(n\) distinct coupon types. Each step you receive a uniformly random coupon. What's the expected number of steps to collect all \(n\) types?

Ch7 L2 solved this with linearity of expectation. Here's the EV DP approach — it gives the same answer but illustrates the general method.

State: \(k\) = number of distinct coupons collected so far.

Base case: \(E[n] = 0\) (you have all coupons, done).

Recurrence: From state \(k\), the next coupon is one you already have with probability \(k/n\), or a new one with probability \((n-k)/n\).

\[E[k] = 1 + \frac{k}{n} \cdot E[k] + \frac{n-k}{n} \cdot E[k+1]\]

The \(E[k]\) on the right side means: if you get a duplicate, you're back in the same state. This is a self-loop — the state depends on itself.

Handling the Self-Loop

Move the \(E[k]\) term to the left:

\[E[k] - \frac{k}{n} E[k] = 1 + \frac{n-k}{n} E[k+1]\]
\[\frac{n-k}{n} E[k] = 1 + \frac{n-k}{n} E[k+1]\]
\[E[k] = \frac{n}{n-k} + E[k+1]\]

The self-loop vanishes. Now this is a standard backward DP: fill \(E[n] = 0\), then \(E[n-1]\), then \(E[n-2]\), down to \(E[0]\).

\[E[0] = \sum_{k=0}^{n-1} \frac{n}{n-k} = n \sum_{j=1}^{n} \frac{1}{j} = n \cdot H_n\]

Same result as Ch7 L2's linearity approach.

Trace for \(n = 4\):

\(k\) \(\frac{n}{n-k}\) \(E[k]\)
\(4\) \(0\)
\(3\) \(4/1 = 4\) \(4\)
\(2\) \(4/2 = 2\) \(6\)
\(1\) \(4/3 \approx 1.33\) \(7.33\)
\(0\) \(4/4 = 1\) \(8.33\)

Expected steps to collect all \(4\) coupons: \(E[0] = 4(1 + 1/2 + 1/3 + 1/4) = 8.33\overline{3}\).


The Self-Loop Rearrangement

The coupon collector trick — moving \(E[k]\) from the right side to the left — works whenever a state has a self-loop (positive probability of staying put). The general pattern:

\[E[s] = 1 + p_{\text{stay}} \cdot E[s] + \sum_{\text{other transitions}} p_t \cdot E[t]\]

Rearrange:

\[(1 - p_{\text{stay}}) \cdot E[s] = 1 + \sum_{\text{other transitions}} p_t \cdot E[t]\]
\[E[s] = \frac{1 + \sum p_t \cdot E[t]}{1 - p_{\text{stay}}}\]

This eliminates the self-loop and reduces the dependency graph. If the remaining transitions form a DAG, you can compute the answer with standard backward DP. If they still have cycles, you need Gaussian elimination.


Cyclic Dependencies

When the DP graph has cycles — like the random walk, where \(E[k]\) depends on both \(E[k-1]\) and \(E[k+1]\) — you can't do a topological-order sweep. Instead, you have a system of linear equations.

Three strategies:

  1. Closed-form solution. Some systems have structure that yields a formula (like \(E[k] = k(n-k)\) for the random walk). Look for arithmetic/geometric patterns in the differences.

  2. Gaussian elimination. For \(m\) states, set up \(m\) equations in \(m\) unknowns and solve in \(O(m^3)\). This always works but is slow for large \(m\).

  3. Tridiagonal systems. Many 1D random walks produce tridiagonal matrices (each equation involves only \(E[k-1]\), \(E[k]\), \(E[k+1]\)). These solve in \(O(m)\) using the Thomas algorithm — just forward elimination and back-substitution on three diagonals.

Most competitive programming EV DP problems are designed so that either the self-loop trick eliminates cycles (coupon collector pattern) or the state space is small enough for Gaussian elimination.


Absorption Probabilities

The same recurrence structure computes probabilities, not just expected values. Instead of "expected steps to reach an endpoint," ask "probability of reaching endpoint \(A\) before endpoint \(B\)."

Random walk example. Start at position \(k\) on \([0, n]\). Each step: left or right with probability \(1/2\). What's \(P[k]\) = probability of reaching \(0\) before \(n\)?

Boundary: \(P[0] = 1\), \(P[n] = 0\).

Recurrence:

\[P[k] = \frac{1}{2} P[k-1] + \frac{1}{2} P[k+1]\]

No \(+1\) term this time — we're tracking probability, not expected cost.

Rearranging: \(P[k+1] - P[k] = P[k] - P[k-1]\). The differences are constant, so \(P\) is linear:

\[P[k] = 1 - \frac{k}{n}\]

Absorption probabilities use the same framework as EV DP. The only change: no immediate cost term, and boundary values are \(0\) or \(1\) (not \(0\)).


Code: Coupon Collector EV DP

#include <bits/stdc++.h>
using namespace std;

int main() {
    int totalCoupons;
    cin >> totalCoupons;

    vector<double> expectedSteps(totalCoupons + 1, 0.0);
    for (int collected = totalCoupons - 1; collected >= 0; collected--) {
        double probNewCoupon = (double)(totalCoupons - collected) / totalCoupons;
        expectedSteps[collected] = 1.0 / probNewCoupon + expectedSteps[collected + 1];
    }

    cout << fixed << setprecision(9) << expectedSteps[0] << "\n";
    return 0;
}

\(O(n)\) time, \(O(n)\) space.


Code: Random Walk EV System

For the random walk on \([0, n]\), the closed form is \(E[k] = k(n-k)\). But here's the general approach using the tridiagonal structure, which works even when step probabilities aren't \(1/2\):

#include <bits/stdc++.h>
using namespace std;

int main() {
    int lineLength;
    cin >> lineLength;

    double probLeft = 0.5;
    double probRight = 0.5;

    vector<double> lower(lineLength + 1, 0.0);
    vector<double> diag(lineLength + 1, 0.0);
    vector<double> upper(lineLength + 1, 0.0);
    vector<double> rhs(lineLength + 1, 0.0);

    diag[0] = 1.0;
    rhs[0] = 0.0;
    diag[lineLength] = 1.0;
    rhs[lineLength] = 0.0;

    for (int position = 1; position < lineLength; position++) {
        lower[position] = -probLeft;
        diag[position] = 1.0;
        upper[position] = -probRight;
        rhs[position] = 1.0;
    }

    for (int position = 1; position <= lineLength; position++) {
        double factor = lower[position] / diag[position - 1];
        diag[position] -= factor * upper[position - 1];
        rhs[position] -= factor * rhs[position - 1];
    }

    vector<double> expectedSteps(lineLength + 1, 0.0);
    expectedSteps[lineLength] = rhs[lineLength] / diag[lineLength];
    for (int position = lineLength - 1; position >= 0; position--) {
        expectedSteps[position] = (rhs[position] - upper[position] * expectedSteps[position + 1]) / diag[position];
    }

    for (int position = 0; position <= lineLength; position++) {
        cout << "E[" << position << "] = " << fixed << setprecision(4) << expectedSteps[position] << "\n";
    }
    return 0;
}

\(O(n)\) time, \(O(n)\) space. The Thomas algorithm solves any tridiagonal system in linear time.


Common Mistakes

  • Forgetting to rearrange self-loops. If your recurrence has \(E[s]\) on both sides, you must algebraically isolate it. Leaving the self-loop in and trying to iterate will diverge or give wrong answers.

  • Assuming a DAG when cycles exist. A random walk is not a DAG. If you try to compute \(E[k]\) from left to right assuming \(E[k-1]\) is already known, you'll miss the dependency on \(E[k+1]\). Always check whether transitions can go "backwards."

  • Floating-point drift in large systems. Gaussian elimination on \(10^3\) or more variables accumulates rounding errors. Use partial pivoting and double (not float). For exact answers modulo a prime, do everything in modular arithmetic.


Quick Recap

  • When tracking full distributions is too expensive, track only the expected value at each state — this is expected value DP.
  • The recurrence is \(E[s] = \sum P(\text{transition}) \times (\text{cost} + E[\text{next state}])\), with \(E[\text{absorbing}] = 0\).
  • Self-loops (state transitions back to itself) are eliminated by rearranging: divide by \((1 - p_{\text{stay}})\).
  • Cyclic dependencies produce systems of linear equations. Solve via closed-form analysis, tridiagonal solvers (\(O(n)\)), or Gaussian elimination (\(O(n^3)\)).
  • Absorption probabilities use the same recurrence structure but without the immediate cost term and with boundary values \(0/1\).

Problems

Problem Link Difficulty Focus
CSES — Moving Robots cses.fi/problemset/task/2209 Medium EV via probability distribution on a grid; independence across robots
CSES — Candy Lottery cses.fi/problemset/task/1727 Medium Expected maximum of \(k\) independent uniform draws; CDF approach
CF 839C — Journey codeforces.com/problemset/problem/839/C Medium EV DP on a tree; expected depth of a random walk from root
CF 235B — Let's Play Osu! codeforces.com/problemset/problem/235/B Medium EV DP with quadratic contribution; track \(E[\text{streak}]\) and \(E[\text{streak}^2]\)