Skip to content

Sieve Variations

Pillar: Shape — "The segmented sieve exploits the square-root shape of the problem — only primes up to \(\sqrt{R}\) matter for the range \([L, R]\)."


The Problem with the Standard Sieve

The Sieve of Eratosthenes from L2 is fast — \(O(N \log \log N)\) — but it has a brutal limitation: it needs \(O(N)\) memory. A boolean array of size \(N\).

For \(N = 10^7\), that's about \(10\) MB. Fine.

For \(N = 10^9\), that's \(1\) GB. Painful.

For \(N = 10^{12}\), that's \(1\) TB. Impossible.

But here's the thing. Most problems don't ask you to find ALL primes up to \(10^{12}\). They ask: find all primes in a RANGE \([L, R]\) where \(R\) might be huge (\(10^{12}\)) but the range width \(R - L\) is small (say \(\leq 10^6\)). That's a completely different problem, and it has a much cheaper solution.

Consider a concrete example. You're given \(L = 10^{11}\) and \(R = 10^{11} + 10^6\). A standard sieve up to \(R\) would need \(100\) GB. The segmented sieve needs about \(1\) MB for the range plus a small primes list up to \(\sqrt{R} \approx 316{,}228\). That's the difference between "impossible" and "runs in milliseconds."

The other direction is also interesting. The standard sieve marks some composites more than once — \(30\) gets crossed out by \(2\), \(3\), AND \(5\). Can we build a sieve where every composite is marked exactly once? That's the linear sieve, and it gives us \(O(N)\) time with a powerful bonus: the smallest prime factor of every number, for free.


Segmented Sieve for Range \([L, R]\)

The key observation is the same one from L1 (primality testing): if \(n\) is composite, it has a prime factor \(\leq \sqrt{n}\). So to find primes in \([L, R]\), you only need primes up to \(\sqrt{R}\).

Step 1: Run a standard sieve to find all primes up to \(\sqrt{R}\). If \(R = 10^{12}\), then \(\sqrt{R} = 10^6\), and there are only \(78{,}498\) primes below \(10^6\). That's a tiny list.

Step 2: Create a boolean array of size \(R - L + 1\) (one slot per number in the range). For each small prime \(p\), find the first multiple of \(p\) that's \(\geq L\), and mark all multiples of \(p\) in \([L, R]\) as composite.

Finding that first multiple: you want the smallest \(m\) such that \(m \geq L\) and \(p \mid m\). That's:

\[m = \left\lceil \frac{L}{p} \right\rceil \times p\]

In integer arithmetic: ((low + prime - 1) / prime) * prime.

One detail: if the first multiple equals \(p\) itself (which happens when \(L \leq p\)), skip it — \(p\) is prime, not composite. Start from \(2p\) in that case.

Space: \(O(\sqrt{R})\) for the small primes sieve, plus \(O(R - L)\) for the range array. NOT \(O(R)\).

Trace: Sieving \([100, 120]\)

Small primes up to \(\sqrt{120} \approx 10.95\): \(\{2, 3, 5, 7\}\).

Start with all numbers \(100\) through \(120\) marked as "possibly prime."

Prime \(2\): First multiple of \(2\) in \([100, 120]\): \(\lceil 100/2 \rceil \times 2 = 100\). Mark \(100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120\).

Prime \(3\): First multiple: \(\lceil 100/3 \rceil \times 3 = 102\). Mark \(102, 105, 108, 111, 114, 117, 120\).

Prime \(5\): First multiple: \(\lceil 100/5 \rceil \times 5 = 100\). Mark \(100, 105, 110, 115, 120\).

Prime \(7\): First multiple: \(\lceil 100/7 \rceil \times 7 = 105\). Mark \(105, 112, 119\).

After all four primes, here's the state of every number in the range:

Number Marked by Status
\(100\) \(2, 5\) composite
\(101\) prime
\(102\) \(2, 3\) composite
\(103\) prime
\(104\) \(2\) composite
\(105\) \(3, 5, 7\) composite
\(106\) \(2\) composite
\(107\) prime
\(108\) \(2, 3\) composite
\(109\) prime
\(110\) \(2, 5\) composite
\(111\) \(3\) composite
\(112\) \(2, 7\) composite
\(113\) prime
\(114\) \(2, 3\) composite
\(115\) \(5\) composite
\(116\) \(2\) composite
\(117\) \(3\) composite
\(118\) \(2\) composite
\(119\) \(7\) composite
\(120\) \(2, 3, 5\) composite

Primes in \([100, 120]\): \(\{101, 103, 107, 109, 113\}\). Five primes. Correct.

Notice that \(119 = 7 \times 17\) was caught by prime \(7\), even though \(17\) is NOT in our small primes list. It didn't need to be — \(7\) is the smaller factor, and \(7 \leq \sqrt{120}\).

Also notice some composites were marked multiple times (\(105\) was hit by \(3\), \(5\), and \(7\)). That's fine — the segmented sieve doesn't aim for one mark per composite. Its goal is low memory, not minimal work per composite. The linear sieve, coming next, fixes that redundancy.


Linear Sieve (Euler's Sieve)

The standard sieve is \(O(N \log \log N)\). The segmented sieve handles huge ranges. But there's a third variant that achieves true \(O(N)\) by marking each composite exactly once — and as a bonus, it computes the smallest prime factor (SPF) of every number.

The idea: when you mark a composite, mark it using its smallest prime factor. If the composite is \(c = \text{SPF}(c) \times k\), then the number \(k\) is the one responsible for marking \(c\). No other number touches it.

The Algorithm

Iterate \(\text{current} = 2, 3, 4, \ldots, N\). Maintain a growing list of primes found so far.

For each \(\text{current}\):

  1. If \(\text{smallestPrimeFactor}[\text{current}]\) is unset, then \(\text{current}\) is prime. Set \(\text{smallestPrimeFactor}[\text{current}] = \text{current}\) and add it to the primes list.

  2. For each prime \(p\) in the primes list (in increasing order), mark the composite \(\text{current} \times p\):

  3. Set \(\text{smallestPrimeFactor}[\text{current} \times p] = p\).
  4. If \(p\) divides \(\text{current}\), stop. Do not continue to the next prime.
  5. Also stop if \(\text{current} \times p > N\) (out of range).

Why Stop When \(p\) Divides current?

This is the subtle part. Suppose \(\text{current} = 12\) and the primes list is \([2, 3, 5, 7, 11]\).

  • \(p = 2\): mark \(12 \times 2 = 24\), with \(\text{SPF}[24] = 2\). Now, does \(2\) divide \(12\)? Yes. Stop.

Why not continue to \(p = 3\) and mark \(12 \times 3 = 36\)? Because \(36 = 2 \times 18\), so \(\text{SPF}(36) = 2\), not \(3\). The composite \(36\) should be marked when \(\text{current} = 18\) and \(p = 2\). If we marked it now with \(p = 3\), we'd mark it twice — once now (wrong SPF) and once later (correct SPF).

The invariant: when we mark \(\text{current} \times p\), the smallest prime factor of that composite IS \(p\). This holds as long as \(p \leq \text{SPF}(\text{current})\). The moment \(p\) divides \(\text{current}\), we've reached \(p = \text{SPF}(\text{current})\), and the next prime in the list would be larger than \(\text{SPF}(\text{current})\), violating the invariant.

To make this concrete: \(\text{current} = 12\), \(\text{SPF}(12) = 2\). We mark \(12 \times 2 = 24\) with \(\text{SPF} = 2\). Correct, since \(24 = 2^3 \times 3\).

If we didn't stop and tried \(p = 3\): we'd mark \(12 \times 3 = 36\) with \(\text{SPF} = 3\). But \(36 = 2^2 \times 3^2\), so \(\text{SPF}(36) = 2\), not \(3\). Wrong. The composite \(36\) will be correctly marked later when \(\text{current} = 18\) and \(p = 2\): \(18 \times 2 = 36\), \(\text{SPF} = 2\). Right.

The stopping rule ensures every composite gets the correct SPF and gets marked exactly once.

Trace: Linear Sieve for \(2\) through \(20\)

Here's every step. The "Marks" column shows which composite gets marked and by which \((\text{current}, p)\) pair.

current Is prime? Inner loop: \((p, \text{composite})\) Stop reason
\(2\) Yes \((2, 4)\) \(2 \mid 2\) → stop
\(3\) Yes \((2, 6)\), \((3, 9)\) \(3 \mid 3\) → stop
\(4\) No, SPF=2 \((2, 8)\) \(2 \mid 4\) → stop
\(5\) Yes \((2, 10)\), \((3, 15)\), \((5, 25)\) \(25 > 20\) → stop
\(6\) No, SPF=2 \((2, 12)\) \(2 \mid 6\) → stop
\(7\) Yes \((2, 14)\), \((3, 21)\) \(21 > 20\) → stop
\(8\) No, SPF=2 \((2, 16)\) \(2 \mid 8\) → stop
\(9\) No, SPF=3 \((2, 18)\), \((3, 27)\) \(27 > 20\) → stop
\(10\) No, SPF=2 \((2, 20)\) \(2 \mid 10\) → stop
\(11\) Yes \((2, 22)\) \(22 > 20\) → stop
\(12\) No, SPF=2 \((2, 24)\) \(24 > 20\) → stop
\(13\)\(20\) ... all products \(> 20\) out of range

Now verify: every composite from \(4\) to \(20\) was marked exactly once.

Composite SPF Marked by \((\text{current}, p)\)
\(4\) \(2\) \((2, 2)\)
\(6\) \(2\) \((3, 2)\)
\(8\) \(2\) \((4, 2)\)
\(9\) \(3\) \((3, 3)\)
\(10\) \(2\) \((5, 2)\)
\(12\) \(2\) \((6, 2)\)
\(14\) \(2\) \((7, 2)\)
\(15\) \(3\) \((5, 3)\)
\(16\) \(2\) \((8, 2)\)
\(18\) \(2\) \((9, 2)\)
\(20\) \(2\) \((10, 2)\)

Every composite appears exactly once. Each is marked when \(\text{current} = c / \text{SPF}(c)\) and \(p = \text{SPF}(c)\). That's why the sieve is linear: \(N\) outer iterations, and the total number of inner markings across ALL iterations is at most \(N\) (one per composite).

Think of it this way. There are at most \(N\) composites in \([2, N]\). Each composite \(c\) has a unique pair \((\text{current}, p) = (c / \text{SPF}(c), \text{SPF}(c))\) that marks it. So the total work across all inner loops is bounded by the number of composites, which is at most \(N\). Add the \(N\) outer-loop iterations, and total time is \(O(N)\).

The Bonus: Multiplicative Functions

The SPF array doesn't just give you fast factorization (divide by \(\text{SPF}\) repeatedly, as in L2). It lets you compute ANY multiplicative function during the sieve itself.

A multiplicative function \(f\) satisfies \(f(ab) = f(a) \cdot f(b)\) when \(\gcd(a, b) = 1\). Examples: \(\tau(n)\) (number of divisors), \(\sigma(n)\) (sum of divisors), \(\phi(n)\) (Euler's totient), \(\mu(n)\) (Mobius function).

When you mark \(\text{current} \times p\) in the linear sieve, you know \(\text{SPF}(\text{current} \times p) = p\). Two cases:

  • If \(p \nmid \text{current}\): then \(\gcd(p, \text{current}) = 1\), so \(f(\text{current} \times p) = f(p) \cdot f(\text{current})\). You already have \(f(\text{current})\) and \(f(p)\) is trivial (it's a prime).
  • If \(p \mid \text{current}\): then \(\text{current}\) and \(p\) share a factor, so the multiplicative property doesn't directly apply. You need to know the highest power of \(p\) dividing \(\text{current}\). Track this with an auxiliary array \(\text{powerOfSPF}[\text{current}]\) that records \(p^k\) where \(p^k \| \text{current}\) (the exact power of the smallest prime dividing \(\text{current}\)). When \(p \mid \text{current}\), you can extract the \(p\)-part and use the multiplicative property on the coprime pieces.

Full treatment of these functions comes in Ch9. The point for now: the linear sieve is the foundation for computing \(\phi\), \(\mu\), \(\tau\), and \(\sigma\) for all \(n \leq N\) in \(O(N)\) total time — not \(O(N \sqrt{N})\) from factorizing each number individually.


The Code

Segmented Sieve

vector<long long> segmentedSieve(long long low, long long high) {
    int sqrtHigh = (int)sqrt((double)high);
    vector<bool> smallIsComposite(sqrtHigh + 1, false);
    vector<int> smallPrimes;
    for (int candidate = 2; candidate <= sqrtHigh; candidate++) {
        if (!smallIsComposite[candidate]) {
            smallPrimes.push_back(candidate);
            for (long long multiple = (long long)candidate * candidate;
                 multiple <= sqrtHigh; multiple += candidate)
                smallIsComposite[multiple] = true;
        }
    }

    long long rangeSize = high - low + 1;
    vector<bool> rangeIsComposite(rangeSize, false);
    for (int prime : smallPrimes) {
        long long firstMultiple = ((low + prime - 1) / prime) * prime;
        if (firstMultiple == prime) firstMultiple += prime;
        for (long long multiple = firstMultiple; multiple <= high; multiple += prime)
            rangeIsComposite[multiple - low] = true;
    }

    vector<long long> result;
    for (long long number = max(low, 2LL); number <= high; number++) {
        if (!rangeIsComposite[number - low])
            result.push_back(number);
    }
    return result;
}

The index trick: number \(x\) in the range \([L, R]\) maps to array index \(x - L\). This offset keeps the array small.

Linear Sieve (Euler's Sieve)

pair<vector<int>, vector<int>> linearSieve(int limit) {
    vector<int> smallestPrimeFactor(limit + 1, 0);
    vector<int> primes;

    for (int current = 2; current <= limit; current++) {
        if (smallestPrimeFactor[current] == 0) {
            smallestPrimeFactor[current] = current;
            primes.push_back(current);
        }
        for (int primeIndex = 0; primeIndex < (int)primes.size(); primeIndex++) {
            long long composite = (long long)current * primes[primeIndex];
            if (composite > limit) break;
            smallestPrimeFactor[composite] = primes[primeIndex];
            if (current % primes[primeIndex] == 0) break;
        }
    }

    return {primes, smallestPrimeFactor};
}

Returns both the primes list and the full SPF array. Use the SPF array for \(O(\log n)\) factorization (same technique as L2).


Complexity

  • Segmented sieve: \(O(\sqrt{R} \log \log \sqrt{R})\) for the small sieve, plus \(O((R - L) \log \log R)\) for marking the range. Space: \(O(\sqrt{R} + (R - L))\).
  • Linear sieve: \(O(N)\) time, \(O(N)\) space. Strictly better than standard sieve's \(O(N \log \log N)\), though in practice the constant factor is similar.

When to Use Which

Situation Best sieve Time Space
Primes up to \(10^7\) Standard (L2) \(O(N \log \log N)\) \(O(N)\)
Many factorizations for \(n \leq 10^6\) SPF sieve (L2) \(O(N)\) precompute, \(O(\log n)\) per query \(O(N)\)
Primes in \([L, R]\) with \(R\) up to \(10^{12}\) Segmented \(O(\sqrt{R} \log \log R + (R-L))\) \(O(\sqrt{R} + (R-L))\)
All \(\phi / \mu / \tau\) values for \(n \leq 10^7\) Linear \(O(N)\) \(O(N)\)

The standard sieve is the workhorse. The SPF sieve from L2 is for when you need fast factorization of many numbers. The segmented sieve is for when \(R\) is astronomically large but the range is narrow. The linear sieve is for when you need multiplicative function values for every number up to \(N\).

How to read the constraint and pick the right tool:

  • "Find all primes up to \(N\)" with \(N \leq 10^7\): Standard sieve. Simple, fast, well within memory.
  • "Given \(Q\) queries, factorize each \(n_i\)" with \(n_i \leq 10^6\): SPF sieve. Precompute once, answer each query in \(O(\log n)\).
  • "Find primes between \(L\) and \(R\)" with \(R \leq 10^{12}\) and \(R - L \leq 10^6\): Segmented sieve. The only option that fits in memory.
  • "Compute \(\phi(n)\) for all \(n \leq 10^7\)": Linear sieve. You need SPF to propagate multiplicative function values.

In contests, the standard sieve handles \(90\%\) of problems. The segmented sieve handles the "primes in a huge range" problems (SPOJ PRIME1 is the classic). The linear sieve is rarely necessary for speed alone — the constant-factor difference between \(O(N)\) and \(O(N \log \log N)\) is negligible for \(N \leq 10^7\) — but it's the cleanest way to compute multiplicative functions, and you'll appreciate it in Ch9.


Common Mistakes

  • Segmented sieve: computing the first multiple wrong. The formula low + (prime - low % prime) FAILS when low is divisible by prime — it gives low + prime instead of low. Use ((low + prime - 1) / prime) * prime instead. Trace it: if \(L = 100\) and \(p = 5\), then \(100 \% 5 = 0\), so 100 + (5 - 0) = 105. Wrong — the first multiple of \(5\) in \([100, 120]\) is \(100\), not \(105\). The ceiling formula gives \(\lceil 100/5 \rceil \times 5 = 20 \times 5 = 100\). Correct.

  • Linear sieve: breaking at the wrong time. The break condition if (current % primes[primeIndex] == 0) must come AFTER marking the composite, not before. If you check divisibility first and break without marking, you'll miss composites. The order is: mark, then check, then break.

  • Forgetting that the segmented sieve needs small primes first. You can't run step 2 without step 1. If you try to mark composites in \([L, R]\) without first sieving primes up to \(\sqrt{R}\), you have no primes to sieve with. This seems obvious, but in implementation it's easy to skip the small sieve when \(L\) is large and forget that \(\sqrt{R}\) is small.

  • Segmented sieve: not skipping the prime itself. When \(L \leq p\), the first multiple of \(p\) in \([L, R]\) might be \(p\) itself. You must not mark \(p\) as composite. The check if (firstMultiple == prime) firstMultiple += prime handles this.

  • Linear sieve: integer overflow. The product \(\text{current} \times p\) can exceed int range. Cast to long long before multiplying.


Quick Recap

  • The standard sieve needs \(O(N)\) memory — impossible when \(N = 10^{12}\).
  • The segmented sieve solves "primes in \([L, R]\)" using only \(O(\sqrt{R} + (R-L))\) space by sieving small primes first, then marking their multiples in the range.
  • The linear sieve (Euler's sieve) marks each composite exactly once via its smallest prime factor, achieving true \(O(N)\) time.
  • The linear sieve's key invariant: when processing \(\text{current}\), stop the inner loop the moment \(p\) divides \(\text{current}\), because continuing would mark composites with the wrong SPF.
  • The SPF array from the linear sieve enables \(O(N)\) computation of multiplicative functions (\(\tau\), \(\sigma\), \(\phi\), \(\mu\)) — the foundation for Ch9.

Problems

Problem Link Difficulty Focus
SPOJ PRIME1 — Prime Generator spoj.com/problems/PRIME1 Medium Classic segmented sieve
LC 204 — Count Primes leetcode.com/problems/count-primes Easy Try linear sieve for \(O(N)\)
Project Euler 10 — Summation of Primes projecteuler.net/problem=10 Easy Sum of primes below \(2 \times 10^6\)
CSES — Counting Divisors cses.fi/problemset/task/1713 Medium SPF sieve for fast factorization