Containers, Sets, and Operations

Here's a problem most beginners don't realize they have: they can use data structures, but they can't talk about what those data structures are doing. They say things like "I'm storing the answer so far" or "I keep track of stuff I've seen." Vague. Fuzzy. And when the problem gets harder, that fuzziness kills them.

This page gives you a precise vocabulary — borrowed from set theory — for describing what you're storing and how you're combining it. You don't need a math degree. You just need to name what you're already doing.

What Is a Container?

Strip away the fancy terminology. A container is anything that holds a collection of objects. That's it.

You already use containers constantly:

A visited array in BFS? That's a container. It holds "nodes I've seen."
A hash map in Two Sum? Container. It holds "values I've encountered so far."
A prefix sum array? Container. It holds "cumulative sums up to each index."
A DP table? Container. It holds "best answers for each state."

In C++, a container might be a vector, set, map, or linked list. Mathematically, it could be a set of indices, a multiset of values, a sequence. The specific implementation doesn't matter yet. What matters is the concept: you have a collection of things.

Why bother naming this? Because the first question in any problem isn't "should I use a segment tree?" It's "what am I collecting?" The data structure is a decision you make after you know what you're tracking. The concept of "I have a collection of X" comes first.

Ask yourself: what objects am I collecting, and what properties do they have? If you can answer that clearly, the right data structure usually becomes obvious.

Buckets: When Your Container Is Too Big

Sometimes your container has too much stuff in it. Processing every element one by one is too slow. So you split it.

A bucket is a piece of your container. You partition the whole collection into smaller chunks, handle each chunk separately, and combine the results. That's the entire idea.

You've seen this pattern before, even if nobody called it "bucketing":

Counting sort. You have an array of numbers. Instead of comparing elements pairwise, you create buckets — one for each possible value — and drop each element into its bucket. Then you read the buckets out in order. The "split into groups by value" step is bucketing.

Square root decomposition. You have an array of \(N\) elements and need to answer range queries. Split the array into roughly \(\sqrt{N}\) blocks. Each block stores a precomputed summary (like its sum or max). When a query spans full blocks, use the summary. When it partially overlaps a block, process those elements individually. Total work: \(O(\sqrt{N})\) per query instead of \(O(N)\).

Mo's algorithm. You have offline range queries. Sort them in a clever order (grouped by which \(\sqrt{N}\)-sized block the left endpoint falls in). Process queries in that order, and the total pointer movement across all queries drops to \(O((N + Q)\sqrt{N})\).

The common thread: your container is too big to process naively, so you split it into buckets and handle each bucket cheaply. Don't think of this as tied to any one algorithm. It's a general-purpose move: decompose to conquer.

Set Operations: Two Ways to Combine

You have two collections. You want to combine them. There are exactly two fundamental ways.

Union (\(A \cup B\)): take everything that's in \(A\), everything that's in \(B\), and throw it all into one pile. If something appears in both, it still only shows up once in the result.

Intersection (\(A \cap B\)): take only the things that appear in both \(A\) and \(B\). Everything else gets discarded.

Simple enough in the abstract. But here's where it gets interesting for competitive programming.

Bits Are Sets

If you represent a set as a binary number — bit \(i\) is 1 if element \(i\) is in the set, 0 if it isn't — then set operations become bitwise operations:

Union = OR. A | B gives you every bit that's set in either \(A\) or \(B\). That's union.
Intersection = AND. A & B gives you only the bits set in both. That's intersection.

This isn't a clever analogy. It's literally the same operation. Every bit manipulation problem is secretly a set problem wearing a disguise. When a problem says "find the OR of all elements in a subarray," it's really asking "what's the union of all these bit-sets?"

A Useful Fact: OR Can Only Grow

Here's something worth internalizing. Take any starting position in an array and start computing the OR as you extend rightward:

OR of A[3..3] = A[3]
OR of A[3..4] = A[3] | A[4]
OR of A[3..5] = A[3] | A[4] | A[5]
...

Each step, the OR either stays the same or gets bigger. It never shrinks. Why? Because OR only sets bits — it never clears them. Once a bit is on, it stays on.

And since a number has at most \(\log_2(V)\) bits (where \(V\) is the maximum value), the OR can change at most \(\log_2(V)\) times before every bit is set and it stops growing.

This means: from any fixed starting position, there are at most \(\log_2(V)\) distinct OR values as you extend the subarray. Even though there are \(O(N)\) possible ending positions, the number of distinct results is tiny — at most \(O(\log V)\).

Across all starting positions, that gives \(O(N \log V)\) distinct subarray OR values total. Way less than the \(O(N^2)\) subarrays. This bound shows up in many problems involving range OR queries.

How Sets Connect to DP

Here's the payoff. Most DP transitions are secretly set operations, and recognizing that makes DP much less mysterious.

You have some solution \(S_1\) — the best answer you've built so far using some subset of elements. You want to build a bigger, better solution. There are really only two moves:

Move 1: Add One Element

Take your existing solution and extend it by one thing. \(S_1 + \{x\}\).

This is the classic DP transition. Your dp[i] depends on dp[i-1] plus whatever the current element contributes. You're growing the set one element at a time.

Knapsack is a perfect example. You've considered items 1 through \(i-1\). Now you decide whether to add item \(i\) to your set. If you add it, the new state is "everything I had before, plus item \(i\)."

Move 2: Merge Two Solutions

Combine two independent solutions into one. \(S_1 \cup S_2\).

This is how divide-and-conquer works. Split the problem in half, solve each half independently, merge the results. Segment trees do exactly this — the answer for range \([L, R]\) is computed by merging the answers for \([L, M]\) and \([M+1, R]\).

Here's the mental shift: instead of asking "how does dp[i][j] transition to dp[i+1][j']?", ask "am I adding one element to my set, or merging two sub-solutions?" That reframing often makes the transition formula obvious.

Bitmask DP: Representing Sets as Integers

When your set is small — say, up to 20 elements — you can do something powerful: represent the entire set as a single integer.

Bit \(i\) of the integer is 1 if element \(i\) is in the set, 0 otherwise. The set \(\{0, 2, 4\}\) becomes the binary number 10101, which is 21 in decimal. The set \(\{1, 3\}\) becomes 01010 = 10.

This means every possible subset of \(N\) elements maps to a unique integer from \(0\) to \(2^N - 1\). And that means you can use these integers as indices in a DP table.

Walking Through TSP

The Traveling Salesman Problem: visit all \(N\) cities exactly once, minimize total distance. The brute-force approach tries all \(N!\) permutations. For \(N = 20\), that's about \(2.4 \times 10^{18}\) — completely hopeless.

But with bitmask DP, you can solve it in \(O(2^N \cdot N^2)\). For \(N = 20\), that's about \(10^{12} \cdot 400 \approx 4 \times 10^8\). Tight, but doable.

State: dp[mask][last] = shortest distance when you've visited exactly the cities in mask, and the most recent city visited is last.

Transition: From state (mask, last), try going to each unvisited city \(j\):

if (!(mask & (1 << j))) {           // is city j unvisited?
    new_mask = mask | (1 << j);      // add j to visited set
    dp[new_mask][j] = min(dp[new_mask][j],
                          dp[mask][last] + dist[last][j]);
}

Look at that transition. mask & (1 << j) checks if \(j\) is in the set (intersection with a singleton). mask | (1 << j) adds \(j\) to the set (union with a singleton). It's the same set operations from earlier, just written in bits.

Why the Ordering Works

There's a subtle but important point. When you loop over masks from 0 to \(2^N - 1\), you're automatically processing subsets before their supersets. Why? Because removing any element from a set always gives you a smaller integer. So by the time you reach mask, every subset of mask has already been computed.

You don't need to think about DP ordering at all. The integers handle it for you. That's elegant.

When to Reach for Bitmask DP

Three signals:

Small set size. \(N \le 20\) is the usual cutoff, since \(2^{20} \approx 10^6\) states, and you often have a factor of \(N\) or \(N^2\) per state.
"Visit all" or "use each exactly once" language. Permutation problems. Assignment problems. Hamiltonian paths.
Factorial blowup in brute force. If the naive approach is \(O(N!)\), bitmask DP often brings it down to \(O(2^N \cdot \text{poly}(N))\).

Lazy Operations: Do Less Work Now

Imagine you have a set of 1000 numbers, and someone says: "Add 5 to every number." You could loop through all 1000 elements and add 5 to each. But what if this happens 1000 times before anyone actually looks at a specific element? You'd do 1,000,000 additions for no reason — nobody's reading the values yet.

The lazy approach: don't touch the individual elements. Instead, write down "+5" on a sticky note attached to the whole set. If someone later says "add 3 to everything," update the sticky note to "+8." Only when someone asks "what's the value of element 42?" do you actually compute: stored value of element 42, plus whatever the sticky note says.

That's laziness. Not the bad kind. The efficient kind.

Making It Concrete

Think of the set as having a "boss" — a representative element that holds the lazy value on behalf of everyone.

Update: "Add 5 to everything." → Tell the boss: lazy += 5. No element gets touched. \(O(1)\).
Query: "What's the value of element \(x\)?" → Return stored\((x)\) + boss.lazy. \(O(1)\).

The real cost gets shifted to the moment you actually need a value. If you do \(Q\) updates and only \(1\) query, you've saved a factor of \(N\) on every update.

Lazy Propagation in Segment Trees

This is where lazy operations shine brightest. A segment tree breaks an array into \(O(\log N)\) hierarchical segments. When you do a range update — say, "add 5 to every element in positions 3 through 7" — some of those segments are fully contained in your range. For those, you just stamp the lazy value on the segment node. No need to recurse into children.

The lazy value only "pushes down" to children when a query forces you to look inside a node that has a pending update. Since any range decomposes into at most \(O(\log N)\) segments, each update costs \(O(\log N)\) instead of \(O(N)\).

The mental model: treat groups of elements as single entities whenever you can. Only break them apart when you absolutely must.

Small to Large Merging

You need to merge two sets. One has 100 elements. The other has 10,000. Which one do you pour into the other?

The answer is obvious when you think about it: pour the small one into the big one. Moving 100 elements is much cheaper than moving 10,000.

But here's the part that's not obvious: this rule gives you a logarithmic guarantee across the entire algorithm, not just for one merge.

The Proof (It's Short)

Every time an element gets moved, it lands in a set that's at least twice as big as the one it left. (Because you always pour small into large — so the receiving set is at least as big as the sending set, and after the merge it's even bigger.)

How many times can an element double before it's in a set of size \(N\)? At most \(\log_2(N)\) times.

So each of the \(N\) elements moves at most \(\log_2(N)\) times total. Total work across all merges: \(O(N \log N)\).

If you merged large into small instead — pouring 10,000 elements into a 100-element set — you'd potentially move \(O(N)\) elements per merge. Do that \(N\) times and you're at \(O(N^2)\).

The Pattern

// Merge sets[v] into sets[u]. Result lives in sets[u].
if (sets[u].size() < sets[v].size())
    swap(sets[u], sets[v]);  // make sure u is the bigger one

for (auto& elem : sets[v])
    sets[u].insert(elem);    // pour small into large

sets[v].clear();

The swap is \(O(1)\) — it just swaps internal pointers. The loop costs \(O(|\text{small set}|)\). And by the proof above, the total cost across all merges is \(O(N \log N)\).

Where You'll See This

Disjoint Set Union (DSU) — union by size/rank is exactly this. The smaller tree gets attached under the larger tree's root.
Tree problems — each node maintains a set (say, distinct colors in its subtree). During DFS, merge child sets into the parent. Always pour the smaller child into the larger one.
Segment tree merging — when combining two sparse segment trees, insert the smaller one's nodes into the larger one.

What's Next

This page gave you a vocabulary: containers, buckets, set operations, bitmasks, laziness, and the small-to-large trick. These aren't isolated concepts — they're the building blocks you'll use every time you design a DP state or pick a data structure.

The next page — Defining States: The Art of DP — puts these tools to work on the hardest part of dynamic programming: figuring out what your state actually is.