Linearity of Expectation

Pillar: Fix — "Fix one element (or one pair). Its expected contribution = its probability times its value. Sum over all elements — you never need to track dependence."

The Setup

You shuffle a deck of \(n\) cards. How many cards end up in their original position?

If you try to compute this by enumerating all \(n!\) permutations and averaging, you're dead — even \(n = 20\) is out of reach. The direct approach requires tracking which cards are in position and which aren't, across all possible shuffles.

But there's a shortcut that sidesteps all of that. Instead of asking "what's the expected number of fixed cards overall?", ask a much simpler question for each card individually: "what's the probability that card \(i\) lands in position \(i\)?" That's \(1/n\). There are \(n\) cards, so the expected number of fixed points is \(n \times 1/n = 1\).

Done. No need to consider permutations, no need to worry about whether card 3 being in position affects card 7. This is linearity of expectation, and it's the single most useful technique in probabilistic problem-solving for competitive programming.

Expected Value

The expected value of a discrete random variable \(X\) is:

\[E[X] = \sum_{x} x \cdot P(X = x)\]

Think of it as the weighted average: each possible outcome \(x\) weighted by how likely it is. Roll a fair die: \(E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + \cdots + 6 \cdot \frac{1}{6} = 3.5\).

The expected value doesn't need to be a possible outcome — you can never roll \(3.5\) — but it tells you what to "expect on average" over many trials.

For problem solving, the key fact isn't the definition itself. It's what happens when you add random variables together.

Linearity of Expectation

Linearity of expectation. For any random variables \(X_1, X_2, \ldots, X_n\):

\[E[X_1 + X_2 + \cdots + X_n] = E[X_1] + E[X_2] + \cdots + E[X_n]\]

This holds regardless of dependence. The variables can be correlated, anti-correlated, or dependent in any way whatsoever. The expectation of the sum is always the sum of the expectations.

Why is this so powerful? Because direct computation of \(E[X]\) often requires summing over an exponential number of outcomes. But if you can decompose \(X\) into \(n\) simpler pieces \(X_1, \ldots, X_n\) where each \(E[X_i]\) is easy to compute, you bypass all that complexity.

The pillar connection is immediate: fix one element (or one pair), compute its expected contribution in isolation, then sum over all elements. The dependencies between elements never enter the picture.

Indicator Random Variables

The workhorse for applying linearity is the indicator random variable. Define:

\[X_i = \begin{cases} 1 & \text{if event } i \text{ occurs} \\ 0 & \text{otherwise} \end{cases}\]

Then \(E[X_i] = 1 \cdot P(\text{event } i) + 0 \cdot P(\text{not event } i) = P(\text{event } i)\).

The expected value of an indicator is just the probability of the event. That's the entire trick. If you can express your quantity of interest as a sum of indicators, the expected value is just the sum of probabilities:

\[E\left[\sum_{i} X_i\right] = \sum_{i} P(\text{event } i)\]

No dependence analysis needed. No joint distributions. Just: what's the probability of each individual event?

The Fix-and-Sum Technique

Here's the recipe that solves the majority of linearity-of-expectation problems:

Identify the quantity you're computing the expectation of (fixed points, inversions, edges, etc.).
Decompose it into a sum of indicator variables — one per element, pair, edge, or whatever unit contributes.
Fix one such unit. Compute the probability that it contributes.
Sum over all units.

The hard part is step 2: choosing the right decomposition. The rest is mechanical. Let's see this in action across three classic examples.

Example 1: Hat-Check Problem (Fixed Points)

\(n\) people check their hats. The hats are returned in a uniformly random permutation. What is the expected number of people who get their own hat back?

Decompose. Let \(X_i = 1\) if person \(i\) gets their own hat, \(0\) otherwise. The total number of fixed points is \(X = X_1 + X_2 + \cdots + X_n\).

Fix one person. Person \(i\) gets their hat back if position \(i\) holds value \(i\) in the random permutation. Since every element is equally likely to land in position \(i\), we have \(P(X_i = 1) = 1/n\).

Sum.

\[E[X] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{1}{n} = 1\]

The expected number of fixed points is \(1\), for any \(n\). Whether \(n = 3\) or \(n = 10^6\), the answer is the same.

Notice that the \(X_i\)'s are dependent — if person 1 gets their hat, that affects the probabilities for everyone else. But linearity doesn't care. Each \(E[X_i] = 1/n\), and the sum is \(1\).

Callback to derangements (Ch6 L2). A derangement is a permutation with zero fixed points. The expected number of fixed points being \(1\) gives intuition for why the fraction of derangements approaches \(1/e\) — most permutations have around \(1\) fixed point, and the Poisson distribution with mean \(1\) assigns probability \(e^{-1}\) to the outcome \(0\).

Example 2: Expected Inversions in a Random Permutation

A random permutation of \([1, 2, \ldots, n]\) is chosen uniformly at random. What is the expected number of inversions?

An inversion is a pair \((i, j)\) with \(i < j\) but \(\sigma(i) > \sigma(j)\) — a pair of elements that are "out of order."

Decompose. For each pair \((i, j)\) with \(1 \leq i < j \leq n\), let \(X_{ij} = 1\) if \(\sigma(i) > \sigma(j)\), and \(0\) otherwise. The total inversions are:

\[X = \sum_{1 \leq i < j \leq n} X_{ij}\]

Fix one pair. In a random permutation, the relative order of any two positions is equally likely to be either way. So \(P(X_{ij} = 1) = 1/2\).

Sum. There are \(\binom{n}{2}\) pairs, so:

\[E[X] = \binom{n}{2} \cdot \frac{1}{2} = \frac{n(n-1)}{4}\]

For \(n = 4\): expected inversions \(= 4 \cdot 3 / 4 = 3\). You can verify by averaging over all \(24\) permutations.

Trace the verification for \(n = 3\). The six permutations and their inversion counts:

Permutation	Inversions	Count
\((1,2,3)\)	none	\(0\)
\((1,3,2)\)	\((2,3)\)	\(1\)
\((2,1,3)\)	\((1,2)\)	\(1\)
\((2,3,1)\)	\((1,3),(2,3)\)	\(2\)
\((3,1,2)\)	\((1,2),(1,3)\)	\(2\)
\((3,2,1)\)	\((1,2),(1,3),(2,3)\)	\(3\)

Average: \((0 + 1 + 1 + 2 + 2 + 3) / 6 = 9/6 = 3/2 = \binom{3}{2}/2\). Confirmed.

Example 3: Coupon Collector

A cereal box contains one of \(n\) distinct coupons, chosen uniformly at random. How many boxes must you buy, on average, to collect all \(n\) coupon types?

This problem doesn't start as a sum of indicators, but it becomes one with the right decomposition.

Decompose into phases. Define phase \(k\) as the period when you have exactly \(k\) distinct coupons and are waiting for the \((k+1)\)-th new one. Phase \(k\) starts when you collect your \((k+1)\)-th distinct coupon and begins at \(k = 0\) (you have \(0\) coupons and need the first).

Let \(T_k\) = number of draws during phase \(k\). The total draws are \(T = T_0 + T_1 + \cdots + T_{n-1}\).

Fix one phase. During phase \(k\), you have \(k\) distinct coupons. Each new draw gives a coupon you haven't seen with probability \((n - k)/n\). So \(T_k\) follows a geometric distribution with success probability \(p_k = (n - k)/n\), and:

\[E[T_k] = \frac{1}{p_k} = \frac{n}{n - k}\]

Sum.

\[E[T] = \sum_{k=0}^{n-1} \frac{n}{n - k} = n \sum_{j=1}^{n} \frac{1}{j} = n \cdot H_n\]

where \(H_n = 1 + 1/2 + 1/3 + \cdots + 1/n\) is the \(n\)-th harmonic number.

Trace for \(n = 4\):

Phase \(k\)	Coupons held	\(P(\text{new})\)	\(E[T_k]\)
\(0\)	\(0\)	\(4/4 = 1\)	\(1\)
\(1\)	\(1\)	\(3/4\)	\(4/3\)
\(2\)	\(2\)	\(2/4 = 1/2\)	\(2\)
\(3\)	\(3\)	\(1/4\)	\(4\)

\[E[T] = 1 + \frac{4}{3} + 2 + 4 = \frac{25}{3} \approx 8.33\]

So with \(4\) coupon types, you expect about \(8.33\) purchases to collect them all. The harmonic sum grows as \(\ln n + \gamma\) (where \(\gamma \approx 0.5772\) is the Euler-Mascheroni constant), so the coupon collector expected value is approximately \(n \ln n + \gamma n\).

The Code

Expected Inversions

The formula gives us the answer directly — no simulation needed. But here's the clean computation, useful when \(n\) is part of a larger problem:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int permutationSize;
    cin >> permutationSize;
    long long totalPairs = (long long)permutationSize * (permutationSize - 1) / 2;
    double expectedInversions = totalPairs / 2.0;
    cout << fixed << setprecision(9) << expectedInversions << "\n";
    return 0;
}

Coupon Collector (Exact Computation)

#include <bits/stdc++.h>
using namespace std;

int main() {
    int numCoupons;
    cin >> numCoupons;
    double expectedDraws = 0.0;
    for (int collected = 0; collected < numCoupons; collected++) {
        double probabilityNew = (double)(numCoupons - collected) / numCoupons;
        expectedDraws += 1.0 / probabilityNew;
    }
    cout << fixed << setprecision(9) << expectedDraws << "\n";
    return 0;
}

Coupon Collector (Monte Carlo Simulation)

When you want to verify an expected value formula, run a simulation. This is also useful for problems where no closed form exists:

#include <bits/stdc++.h>
using namespace std;

int main() {
    int numCoupons = 10;
    int numTrials = 1000000;
    mt19937 rng(42);
    uniform_int_distribution<int> dist(0, numCoupons - 1);

    double totalDraws = 0;
    for (int trial = 0; trial < numTrials; trial++) {
        vector<bool> seen(numCoupons, false);
        int distinctSeen = 0;
        int draws = 0;
        while (distinctSeen < numCoupons) {
            int coupon = dist(rng);
            draws++;
            if (!seen[coupon]) {
                seen[coupon] = true;
                distinctSeen++;
            }
        }
        totalDraws += draws;
    }

    double simulated = totalDraws / numTrials;
    double exact = 0;
    for (int collected = 0; collected < numCoupons; collected++) {
        exact += (double)numCoupons / (numCoupons - collected);
    }

    cout << "Simulated: " << fixed << setprecision(4) << simulated << "\n";
    cout << "Exact:     " << fixed << setprecision(4) << exact << "\n";
    return 0;
}

For \(n = 10\), the exact answer is \(10 \cdot H_{10} \approx 29.2897\). A million trials will give you something within \(0.05\) of that.

Complexity

Expected inversions: \(O(1)\) time, \(O(1)\) space.

Coupon collector exact formula: \(O(n)\) time, \(O(1)\) space.

Coupon collector simulation: \(O(\text{numTrials} \times n \cdot H_n)\) expected time, \(O(n)\) space.

When Does Linearity Apply?

Linearity of expectation always applies — it's not a technique you need to "check conditions" for. The real question is: can you find a good decomposition?

The technique works best when:

The quantity is naturally a sum over elements or pairs (inversions, fixed points, edges in a random graph, connected components).
Each indicator's probability is easy to compute in isolation, even if the indicators are heavily dependent.
Direct computation would require summing over exponentially many configurations.

The technique does not help when you need \(E[f(X)]\) for a nonlinear function \(f\). For example, \(E[X^2] \neq (E[X])^2\) in general. If the problem asks for the expected value of a product, a maximum, or another nonlinear function, linearity alone won't suffice — you'll likely need expected value DP (Ch7 L3).

Common Mistakes

Assuming linearity requires independence. It doesn't. This is the whole point. You can have \(n\) heavily correlated indicator variables and still sum their expectations. Independence is needed for \(E[X \cdot Y] = E[X] \cdot E[Y]\), but never for \(E[X + Y] = E[X] + E[Y]\).
Wrong indicator decomposition. The trickiest step is choosing what each \(X_i\) represents. For inversions, the indicator is per pair, not per element. For the hat-check problem, it's per element. Getting the unit wrong leads to double-counting or missed contributions.
Floating-point precision. Many problems ask for the expected value modulo a prime, not as a floating-point number. In that case, represent the expectation as a fraction \(p/q\) and compute \(p \cdot q^{-1} \mod M\) using modular inverse. Don't mix floating-point arithmetic into modular computations.

Quick Recap

Linearity of expectation: \(E[\sum X_i] = \sum E[X_i]\), always, regardless of dependence.
Indicator trick: If \(X_i \in \{0, 1\}\), then \(E[X_i] = P(X_i = 1)\). Express your quantity as a sum of indicators, and the expected value becomes a sum of probabilities.
Fix and sum: Fix one element/pair, compute its contribution probability, sum over all. This is the Fix pillar in action.
Coupon collector: \(E = n \cdot H_n\). Decompose into geometric phases, each with a different success probability.
Hat-check: Expected fixed points \(= 1\) for any \(n\). Each position contributes \(1/n\), and there are \(n\) positions.
Expected inversions: \(\binom{n}{2}/2\). Each pair is inverted with probability \(1/2\).

Problems

Problem	Link	Difficulty	Focus
CSES — Inversion Probability	cses.fi/problemset/task/1725	Medium	Linearity over pairs; each pair's inversion probability from given distributions
CF 280C — Game on Tree	codeforces.com/problemset/problem/280/C	Medium	Fix one node; probability it's removed before any ancestor; linearity over nodes
CF 235B — Let's Play Osu!	codeforces.com/problemset/problem/235/B	Medium	Decompose squared run-lengths into pair contributions; linearity of expectation on pairs