Skip to content

Numbers Are Sets, Operations Are Everything

Pillar: Set — "Every integer is secretly a multiset of prime factors. GCD and LCM are just set intersection and union on that representation."


Containers, Sets, and Operations

Here's a problem most beginners don't realize they have: they can use data structures, but they can't talk about what those data structures are doing. They say things like "I'm storing the answer so far" or "I keep track of stuff I've seen." Vague. Fuzzy. And when the problem gets harder, that fuzziness kills them.

This lesson gives you a precise vocabulary — borrowed from set theory — for describing what you're storing and how you're combining it. You don't need a math degree. You just need to name what you're already doing.

What Is a Container?

Strip away the fancy terminology. A container is anything that holds a collection of objects. That's it.

You already use containers constantly:

  • A visited array in BFS? That's a container. It holds "nodes I've seen."
  • A hash map in Two Sum? Container. It holds "values I've encountered so far."
  • A prefix sum array? Container. It holds "cumulative sums up to each index."
  • A DP table? Container. It holds "best answers for each state."

In C++, a container might be a vector, set, map, or linked list. Mathematically, it could be a set of indices, a multiset of values, a sequence. The specific implementation doesn't matter yet. What matters is the concept: you have a collection of things.

Why bother naming this? Because the first question in any problem isn't "should I use a segment tree?" It's "what am I collecting?" The data structure is a decision you make after you know what you're tracking. The concept of "I have a collection of X" comes first.

Buckets: When Your Container Is Too Big

Sometimes your container has too much stuff in it. Processing every element one by one is too slow. So you split it.

A bucket is a piece of your container. You partition the whole collection into smaller chunks of size \(B\), handle each chunk separately, and combine the results.

The simplest version: square root decomposition. Split an array of \(N\) elements into \(\sqrt{N}\) blocks. Each block stores a precomputed summary (like its sum or max). A query spanning full blocks uses the summary; a partial overlap processes elements individually. Total work: \(O(\sqrt{N})\) per query instead of \(O(N)\).

But bucketing goes much deeper than that. The real power is in decomposing ranges into full blocks + partial remainders, then handling different interaction types separately.

The Range Decomposition Trick

Given a range \([l, r]\) and block size \(B\), split it into three pieces:

  1. Left partial block\([l, \text{end of } l\text{'s block}]\) — at most \(B\) elements
  2. Full blocks in the middle — complete blocks from \(l\)'s block \(+ 1\) through \(r\)'s block \(- 1\)
  3. Right partial block\([\text{start of } r\text{'s block}, r]\) — at most \(B\) elements

Full blocks can be handled in bulk (one operation per block). Partial blocks require element-by-element processing but contain at most \(B\) elements each.

Why This Matters: Interaction Types

When two ranges interact (like counting pairs across two paths), the decomposition creates three types of interactions:

Type What Cost
Self Both elements in the same partial block Brute force \(O(B^2)\) per block, but there are few partial blocks
Cross One element in a partial block, the other in a full block Process the partial block element-by-element against full-block summaries
Full × Full Both elements in full blocks Aggregate with prefix sums — \(O(1)\) per block pair

The total cost balances when \(B \approx \sqrt{N}\): partial blocks cost \(O(\sqrt{N})\) each (at most \(\sqrt{N}\) elements), and there are \(O(\sqrt{N})\) full blocks. This is the pattern behind Mo's algorithm (sort queries by left-endpoint block, move pointers — total movement \(O((N + Q)\sqrt{N})\)) and many advanced tree problems where Euler tour intervals get decomposed into blocks.

Counting Sort: Bucketing by Value

A different flavor of bucketing: instead of spatial blocks, create buckets by value. Drop each element into its bucket, then read the buckets in order. The "split into groups by value" step is bucketing, and it gives \(O(N + V)\) sorting when the value range \(V\) is small.

The common thread across all forms: your container is too big to process naively, so you split it into buckets and handle each bucket cheaply. Don't think of this as tied to any one algorithm. It's a general-purpose move: decompose to conquer.


Numbers Are Multisets

Take the number \(60\). You know its prime factorization: \(60 = 2^2 \times 3^1 \times 5^1\).

Now look at that differently. Instead of thinking "\(60\) is a product of primes," think: "\(60\) is a bag (multiset) of primes." Specifically:

\[60 \;\longleftrightarrow\; \{2^2,\; 3^1,\; 5^1\}\]

The number \(60\) IS this multiset. Two copies of \(2\), one copy of \(3\), one copy of \(5\). Nothing else. Every positive integer greater than \(1\) has exactly one such multiset (this is the Fundamental Theorem of Arithmetic, but you already know that — the point is to USE it, not admire it).

Why care? Because the moment you see numbers as multisets of primes, two operations you've used since middle school — GCD and LCM — stop being mysterious algorithms and become simple set operations.


Prime Factorization as a Multiset

Any integer \(n > 1\) can be written uniquely as:

\[n = p_1^{e_1} \times p_2^{e_2} \times \cdots \times p_k^{e_k}\]

Read this as a multiset: for each prime \(p_i\), the number \(n\) "contains" \(e_i\) copies of it. Primes that don't appear have exponent \(0\).

Some examples:

Number Factorization Multiset
\(12\) \(2^2 \times 3^1\) \(\{2^2, 3^1\}\)
\(18\) \(2^1 \times 3^2\) \(\{2^1, 3^2\}\)
\(60\) \(2^2 \times 3^1 \times 5^1\) \(\{2^2, 3^1, 5^1\}\)
\(7\) \(7^1\) \(\{7^1\}\)
\(1\) (empty product) \(\{\}\)

The number \(1\) is the empty multiset. No prime factors at all. Remember this — it matters later.


GCD = Intersection, LCM = Union

Here is the core insight of this lesson.

Given two numbers \(a\) and \(b\) with their prime multisets, \(\gcd(a, b)\) takes the minimum exponent of each prime, and \(\text{lcm}(a, b)\) takes the maximum exponent of each prime.

Minimum exponent = set intersection (keep only what both share). Maximum exponent = set union (keep everything from either).

Worked example: \(\gcd(12, 18)\) and \(\text{lcm}(12, 18)\).

\[12 = 2^2 \times 3^1 \qquad 18 = 2^1 \times 3^2\]

For \(\gcd\), take \(\min\) of each exponent:

\[\gcd = 2^{\min(2,1)} \times 3^{\min(1,2)} = 2^1 \times 3^1 = 6\]

For \(\text{lcm}\), take \(\max\) of each exponent:

\[\text{lcm} = 2^{\max(2,1)} \times 3^{\max(1,2)} = 2^2 \times 3^2 = 36\]

Now verify the identity \(\gcd(a,b) \times \text{lcm}(a,b) = a \times b\):

\[6 \times 36 = 216 = 12 \times 18 \quad \checkmark\]

This identity always holds, and from the multiset perspective it's obvious: for each prime, \(\min(e_a, e_b) + \max(e_a, e_b) = e_a + e_b\). The intersection plus the union equals the sum of the two bags.


What IS an Operation on a Set?

You've just seen two operations on multisets (min-exponent and max-exponent). Before going further, step back. What does it mean for an operation to "work well" on a set?

Three properties matter. We'll use the symbol \(\oplus\) for a generic binary operation.

Closure — applying \(\oplus\) to two elements of the set produces an element still in the set. Addition on integers: \(3 + 5 = 8\), still an integer. Division on integers: \(3 / 5 = 0.6\), NOT an integer. Addition is closed on integers; division is not.

Associativity — the grouping doesn't matter: \((a \oplus b) \oplus c = a \oplus (b \oplus c)\). Addition: \((2 + 3) + 4 = 2 + (3 + 4) = 9\). Subtraction: \((10 - 3) - 2 = 5\), but \(10 - (3 - 2) = 9\). Addition is associative; subtraction is not.

Identity element — there exists an element \(e\) such that \(a \oplus e = e \oplus a = a\) for all \(a\). For addition on integers, \(e = 0\). For multiplication, \(e = 1\). For \(\max\) on integers... think about it. What value \(e\) satisfies \(\max(a, e) = a\) for every integer \(a\)? It would need to be \(-\infty\).


The Definition That Unlocks Everything

A set \(S\) equipped with a binary operation \(\oplus\) is a monoid if:

  1. \(\oplus\) is closed on \(S\)
  2. \(\oplus\) is associative
  3. There exists an identity element \(e \in S\)

That's it. Three properties. The word monoid might sound intimidating, but you've been using monoids your entire programming life. Every time you write a prefix sum, you're scanning over the additive monoid. Every time a segment tree merges two children, it applies a monoid operation.

This is the KEY algebraic vocabulary that the DP and Graph courses will depend on. Whenever you see "combine partial answers," you're combining monoid elements. When a problem says "answer range queries" or "accumulate results left to right," your first instinct should be: what's the monoid?


Monoid Examples

Set Operation Identity Name
integers \(+\) \(0\) Additive monoid
integers \(\times\) \(1\) Multiplicative monoid
integers \(\max\) \(-\infty\) Max monoid
integers \(\min\) \(+\infty\) Min monoid
non-negative integers \(\gcd\) \(0\) GCD monoid
positive integers \(\text{lcm}\) \(1\) LCM monoid
strings concatenation \(\text{""}\) Free monoid

The GCD identity deserves a pause. \(\gcd(a, 0) = a\) for any \(a\). Why? Because \(0\) is divisible by every integer, so every integer is a common divisor of \(a\) and \(0\), and the greatest one is \(a\) itself. In multiset terms, \(0\) has "infinite" copies of every prime, so \(\min(\text{any exponent}, \infty) = \text{any exponent}\).

The LCM identity: \(\text{lcm}(a, 1) = a\). The number \(1\) has the empty prime multiset, so \(\max(\text{any exponent}, 0) = \text{any exponent}\).


Why Monoids Matter

Every prefix sum is a monoid scan. Every segment tree node stores a monoid product. Every DP transition applies a monoid operation.

Once you recognize the monoid, you get three things for free:

1. Prefix arrays. Build prefix[i] = prefix[i-1] ⊕ a[i]. Works for sum, product, GCD, max — any monoid. A prefix sum over an array of integers is a monoid scan over \((\mathbb{Z}, +, 0)\). Replace \(+\) with \(\gcd\) and you get prefix GCDs. Replace it with \(\max\) and you get prefix maximums. Same code structure, different combine function.

Trace it. Array \([3, 7, 2, 5]\) under the additive monoid:

\[\text{prefix}[0] = 3, \quad \text{prefix}[1] = 3 + 7 = 10, \quad \text{prefix}[2] = 10 + 2 = 12, \quad \text{prefix}[3] = 12 + 5 = 17\]

Now the same array under the max monoid (identity \(= -\infty\)):

\[\text{prefix}[0] = 3, \quad \text{prefix}[1] = \max(3, 7) = 7, \quad \text{prefix}[2] = \max(7, 2) = 7, \quad \text{prefix}[3] = \max(7, 5) = 7\]

Both are the same algorithm. The only difference is the binary operation and its identity.

Now consider the min monoid (identity \(= +\infty\)). A segment tree built over an array with the \(\min\) operation answers "what's the minimum value in range \([l, r]\)?" in \(O(\log n)\). That's Range Minimum Query — one of the most common data structure problems — and it falls out for free once you recognize min as a monoid.

2. Range queries via segment trees. A segment tree over \(n\) elements with a monoid operation answers range queries in \(O(\log n)\). Each internal node stores the monoid product of its children. To query a range, you decompose it into \(O(\log n)\) nodes and combine their stored products. The monoid structure is what makes segment trees work — associativity lets you combine sub-ranges in any order and get the correct answer.

When updates happen over ranges instead of single elements, lazy propagation batches them. The idea is simple: instead of touching every element, tag the container itself.

Imagine you have a set of 1000 numbers, and someone says: "Add 5 to every number." You could loop through all 1000 elements and add 5 to each. But what if this happens 1000 times before anyone actually looks at a specific element? You'd do 1,000,000 additions for no reason — nobody's reading the values yet.

The lazy approach: don't touch the individual elements. Instead, write down "+5" on a sticky note attached to the whole set. If someone later says "add 3 to everything," update the sticky note to "+8." Only when someone asks "what's the value of element 42?" do you actually compute: stored value of element 42, plus whatever the sticky note says.

Making it concrete. Think of the set as having a "boss" — a representative element that holds the lazy value on behalf of everyone.

  • Update: "Add 5 to everything." → Tell the boss: lazy += 5. No element gets touched. \(O(1)\).
  • Query: "What's the value of element \(x\)?" → Return \(\text{stored}(x) + \text{boss.lazy}\). \(O(1)\).

The real cost gets shifted to the moment you actually need a value. If you do \(Q\) updates and only \(1\) query, you've saved a factor of \(N\) on every update.

In a segment tree, this works hierarchically. A segment tree breaks an array into \(O(\log N)\) hierarchical segments. When you do a range update — say, "add 5 to every element in positions 3 through 7" — some segments are fully contained in your range. For those, you just stamp the lazy value on the segment node. No need to recurse into children. The lazy value only "pushes down" to children when a query forces you to look inside a node that has a pending update. Since any range decomposes into at most \(O(\log N)\) segments, each update costs \(O(\log N)\) instead of \(O(N)\).

The mental model: treat groups of elements as single entities whenever you can. Only break them apart when you absolutely must.

3. Fast exponentiation. Matrix exponentiation for DP transitions works because matrices under multiplication form a monoid. You'll see this in the Chain chapter.

4. Small-to-large merging. You need to merge two sets. One has 100 elements. The other has 10,000. Which one do you pour into the other?

Pour the small one into the big one. Moving 100 elements is much cheaper than moving 10,000. But the part that's not obvious: this rule gives you a logarithmic guarantee across the entire algorithm, not just for one merge.

The proof (it's short). Every time an element gets moved, it lands in a set that's at least twice as big as the one it left. (Because you always pour small into large — so the receiving set is at least as big as the sending set, and after the merge it's even bigger.) How many times can an element double before it's in a set of size \(N\)? At most \(\log_2(N)\) times. So each of the \(N\) elements moves at most \(\log_2(N)\) times total. Total work across all merges: \(O(N \log N)\).

If you merged large into small instead — pouring 10,000 elements into a 100-element set — you'd potentially move \(O(N)\) elements per merge. Do that \(N\) times and you're at \(O(N^2)\).

// Merge sets[v] into sets[u]. Result lives in sets[u].
if (sets[u].size() < sets[v].size())
    swap(sets[u], sets[v]);  // make sure u is the bigger one

for (auto& elem : sets[v])
    sets[u].insert(elem);    // pour small into large

sets[v].clear();

The swap is \(O(1)\) — it just swaps internal pointers. The loop costs \(O(|\text{small set}|)\). And by the proof above, the total cost across all merges is \(O(N \log N)\).

Where you'll see this: DSU (union by size/rank is exactly this), tree problems (each node maintains a set of distinct colors in its subtree — during DFS, merge child sets into parent, always pouring smaller into larger), and segment tree merging.

Concrete example: suppose you have an array \([12, 18, 24, 36]\) and you need the GCD of every prefix.

\[\text{prefixGCD}[0] = 12$$ $$\text{prefixGCD}[1] = \gcd(12, 18) = 6$$ $$\text{prefixGCD}[2] = \gcd(6, 24) = 6$$ $$\text{prefixGCD}[3] = \gcd(6, 36) = 6\]

That's a monoid scan over the GCD monoid. Same pattern as prefix sums, just a different operation. The monoid structure guarantees it works.


Sets Connect to DP

You've now seen that numbers are multisets and operations on those multisets form monoids. Here's the bridge to the rest of the course: the same "set" thinking solves DP problems.

Most students learn DP as a grab bag of tricks — knapsack, LIS, interval DP — without seeing what connects them. The connection is sets. Every DP state describes a collection (which items you've considered, which cities you've visited, which amount you've reached). Every DP transition modifies that collection. Once you see DP through the set lens, designing states becomes mechanical instead of magical.

This section gives you a concrete method — the Noun Technique — plus two construction patterns that cover nearly every DP problem you'll encounter.

The Noun Technique

When you face a DP problem and have no idea what the state should be, do this: read the problem statement and pull out every noun — every thing being counted, measured, limited, or tracked. Then ask: which of these nouns change as I make decisions? The ones that change are your state dimensions.

Example: Coin Change. "Find the minimum number of coins to make amount \(N\)."

Nouns: coins, amount. The coin set is fixed — you can always use any coin. The amount changes with each coin you pick. So the state is one-dimensional:

\[dp[\text{amount}] = \text{minimum coins to reach this amount}\]
Amount 0 1 2 3 4 5 6
Min coins 0 1 2 1 2 1 2

For amount \(6\) with coins \(\{1, 3, 5\}\):

\[dp[6] = \min(dp[6-1]+1,\; dp[6-3]+1,\; dp[6-5]+1) = \min(dp[5]+1,\; dp[3]+1,\; dp[1]+1) = \min(2, 2, 2) = 2\]

Notice what's NOT in the state: which coins you used, the order you used them in, how many of each denomination. None of that affects your future options. Only the remaining amount matters. That's what "summary" means — you compress all the irrelevant detail down to the one number that determines what happens next.

Example: Cheapest Flights Within K Stops. "Find the cheapest flight from city \(A\) to city \(B\) with at most \(K\) stops."

Nouns: city, cost, stops. Now filter:

  • City changes (you fly somewhere). Does it affect your future options? Yes — which city you're in determines which flights you can take next.
  • Stops changes (you use one per flight). Does it affect your future options? Yes — if you've used all \(K\) stops, you can't fly anymore.
  • Cost changes (you pay for the ticket). Does it affect your future options? No. Whether you spent $100 or $500 getting here, you have the same flights available. Cost is what you're minimizing — it's the VALUE, not a state dimension.

So the state is:

\[dp[\text{city}][\text{stops\_used}] = \text{minimum cost to reach this city using this many stops}\]

Two nouns that change your future options become two state dimensions. The noun you're optimizing becomes the value. That's the whole method.

Practice this on every new problem. Read the statement, underline the nouns, ask "which change?" and "which determines my future options?" Within a minute you'll have a candidate state. It won't always be optimal, but it gives you a starting point far better than staring at a blank screen.

Value vs. State: The Critical Distinction

The thing being OPTIMIZED — minimum cost, maximum profit, number of ways — is the VALUE you store at each state. It does not go in the index. If you're minimizing cost, don't write \(dp[\text{city}][\text{cost}]\). Write \(dp[\text{city}] = \text{min cost}\). Putting the optimized quantity in the index blows up your state space and is almost always wrong.

Here's the trap in action. Suppose costs range from \(0\) to \(10^6\). If cost is a state dimension, you need \(10^6\) entries per city. If cost is the stored value, you need one entry per city. The difference between TLE and AC is often just this: did you put the right thing in the index?

Two Ways to Build a Collection

Every DP transition does one of two things:

Add one element. Take your existing solution and extend it by one item. \(dp[i]\) depends on \(dp[i-1]\) plus whatever the current element contributes.

  • Knapsack: you've considered items \(1\) through \(i-1\). Now decide whether to add item \(i\). If you take it, the new state includes item \(i\)'s weight and value. If you skip, the state is unchanged.
  • Coin change: you have a solution for amount \(a\). Adding one coin of value \(c\) gives a solution for amount \(a + c\).
  • LIS: you have a longest increasing subsequence ending at some value. Adding the next element either extends it or doesn't.

The transition pattern: \(dp[i] = f(dp[i-1], \text{current element})\).

Merge two solutions. Combine two independent partial solutions. \(dp[L][R]\) is computed from \(dp[L][M]\) and \(dp[M+1][R]\). This is how interval DP and divide-and-conquer work. A segment tree does exactly this — the answer for a range is the monoid product of two children's answers.

Example: matrix chain multiplication. You want the cheapest way to parenthesize \(M_L \times M_{L+1} \times \cdots \times M_R\). For every possible split point \(M\):

\[dp[L][R] = \min_{M} \big( dp[L][M] + dp[M+1][R] + \text{cost of multiplying the two resulting matrices} \big)\]

You solve \([L, M]\) and \([M+1, R]\) independently, then pay the cost of one final multiplication. The sub-problems are independent because how you parenthesize the left half doesn't affect the right half. That independence is what makes the merge valid.

Instead of staring at a DP table and guessing transitions, ask: "Am I adding one element to my set, or merging two sub-solutions?" That reframing often makes the transition formula obvious.

Bitmask DP: Sets as Integers

When your collection is small — up to about \(20\) elements — you can represent the entire set as a single integer. Bit \(i\) is \(1\) if element \(i\) is in the set, \(0\) otherwise. The set \(\{0, 2, 4\}\) becomes binary \(10101 = 21\). Every subset maps to a unique integer from \(0\) to \(2^N - 1\), and those integers become DP indices.

Full treatment in the next lesson (Bits Are Sets). The point here: bitmask DP is the set pillar taken literally. Union is OR. Intersection is AND. Adding an element is mask | (1 << j). Checking membership is mask & (1 << j). The same set operations from prime factorizations, just wearing a different disguise.

Quick example: the Traveling Salesman Problem asks you to visit all \(N\) cities at minimum cost. State: \(dp[\text{mask}][\text{last}]\) where mask encodes which cities you've visited and last is the most recent city. Transition: try adding each unvisited city \(j\) to the set via mask | (1 << j). The "adding one element" pattern from above, with the set stored as bits.

Why does this help? Brute force tries all \(N!\) permutations. Bitmask DP tries all \(2^N \cdot N\) state-city pairs. For \(N = 20\), that's \(\approx 2 \times 10^7\) instead of \(\approx 2.4 \times 10^{18}\). The savings come from recognizing that many different orderings of the same set of visited cities lead to the same future options.


The Code

Prime Factorization

map<int, int> factorize(int number) {
    map<int, int> primeExponents;
    for (int divisor = 2; (long long)divisor * divisor <= number; divisor++) {
        while (number % divisor == 0) {
            primeExponents[divisor]++;
            number /= divisor;
        }
    }
    if (number > 1) primeExponents[number]++;
    return primeExponents;
}

Trial division up to \(\sqrt{n}\). If anything remains after the loop, it's a prime factor greater than \(\sqrt{n}\) (there can be at most one).

GCD via Multiset Intersection

long long gcdFromFactors(map<int, int>& factorsA, map<int, int>& factorsB) {
    long long result = 1;
    for (auto& [prime, exponent] : factorsA) {
        if (factorsB.count(prime)) {
            int minExponent = min(exponent, factorsB[prime]);
            for (int i = 0; i < minExponent; i++) result *= prime;
        }
    }
    return result;
}

LCM via Multiset Union

long long lcmFromFactors(map<int, int>& factorsA, map<int, int>& factorsB) {
    map<int, int> merged = factorsA;
    for (auto& [prime, exponent] : factorsB) {
        merged[prime] = max(merged[prime], exponent);
    }
    long long result = 1;
    for (auto& [prime, exponent] : merged) {
        for (int i = 0; i < exponent; i++) result *= prime;
    }
    return result;
}

The Euclidean Algorithm (The Fast Way)

The multiset approach is \(O(\sqrt{n})\) for factorization. The Euclidean algorithm computes GCD in \(O(\log(\min(a,b)))\) without ever factoring anything:

int gcd(int a, int b) {
    while (b) {
        a %= b;
        swap(a, b);
    }
    return a;
}

Now you understand WHY gcd(a, 0) = a — it's not just a base case. \(0\) is the identity element of the GCD monoid. The algorithm terminates when it hits the identity.

And LCM from GCD:

long long lcm(long long a, long long b) {
    return a / gcd(a, b) * b;
}

Divide before multiplying to avoid overflow. This works because \(\gcd(a,b) \times \text{lcm}(a,b) = a \times b\).


Complexity

  • factorize(n): \(O(\sqrt{n})\) time, \(O(\log n)\) space (at most \(\log_2 n\) distinct primes).
  • gcdFromFactors, lcmFromFactors: \(O(k)\) where \(k\) is the number of distinct primes — but you already paid \(O(\sqrt{n})\) to factorize.
  • Euclidean gcd(a, b): \(O(\log(\min(a, b)))\) time, \(O(1)\) space. Almost always the right choice.

Use the multiset approach when you need the factorization itself (e.g., counting divisors, computing Euler's totient). Use Euclidean GCD when you just need the GCD value. In contests, you'll almost always use the Euclidean version — but understanding the multiset version is what lets you see WHY the Euclidean version works and when factorization-based approaches are worth the cost.


Common Mistakes

  • Overflow in LCM. a * b / gcd(a, b) overflows before dividing. Always compute a / gcd(a, b) * b instead.
  • Forgetting gcd(0, 0). Mathematically \(\gcd(0, 0)\) is undefined (every integer divides \(0\)). Most implementations return \(0\). Know what your library does.
  • Assuming subtraction is a monoid. It's not associative: \((10 - 3) - 2 \neq 10 - (3 - 2)\). Prefix sums work because addition is a monoid. "Prefix differences" don't compose the same way.
  • Putting the optimized value in the DP state index. If you're minimizing cost, \(dp[\text{city}][\text{cost}]\) is almost always wrong. Cost is what you store, not what you index by. The Noun Technique catches this: ask "does this noun affect my future options?" If not, it's the value, not a state dimension.
  • Thinking "add one element" when the problem needs "merge two solutions." If you're doing interval DP and your transition doesn't seem to work, you're probably trying to build left-to-right when you should be splitting and merging. Check: does the problem decompose into two independent halves?

Quick Recap

  • Every integer is a multiset of prime factors. The number \(1\) is the empty multiset.
  • \(\gcd\) = min exponents (intersection). \(\text{lcm}\) = max exponents (union). \(\gcd \times \text{lcm} = a \times b\).
  • A monoid is a set with an associative, closed operation and an identity element. Three properties, massive payoff.
  • Prefix sums, segment trees, and DP transitions all rely on the monoid structure — same pattern, different operation. Segment trees store monoid products at each node; lazy propagation delays pushing updates until needed.
  • The Euclidean algorithm computes GCD in \(O(\log n)\) without factoring. Its base case (\(\gcd(a, 0) = a\)) is the monoid identity at work.
  • To find a DP state: pull out the nouns, identify which change between decisions, and use those as dimensions. The optimized quantity is the value, not a dimension.
  • Two DP construction patterns: add one element (\(dp[i]\) from \(dp[i-1]\)) or merge two solutions (\(dp[L][R]\) from sub-ranges). Recognizing which pattern applies is half the battle.
  • Bitmask DP represents a set as an integer — union is OR, intersection is AND. Full treatment in the next lesson.

Problems

Problem Link Difficulty Focus
CSES — Common Divisors cses.fi/problemset/task/1081 Easy GCD warm-up
LC 1979 — Find GCD of Array leetcode.com/problems/find-greatest-common-divisor-of-array Easy Monoid scan over GCD
LC 2654 — Minimum Operations to Make All Elements Equal to 1 leetcode.com/problems/minimum-number-of-operations-to-make-all-array-elements-equal-to-1 Medium Subarray GCD properties
CSES — Sum of Divisors cses.fi/problemset/task/1082 Medium Factorization structure