Large Numbers: Miller-Rabin and Pollard's Rho
Pillar: Shape — "Miller-Rabin: a composite \(n\) MUST fail the Fermat test for some witness. The SHAPE of the failure pattern — whether the squaring sequence hits \(-1\) before reaching \(1\) — distinguishes primes from composites."
Pillar: Chain — "Pollard's Rho: Floyd's cycle detection on \(x \to x^2 + c \bmod n\) — a CHAIN that eventually cycles. When the cycle reveals a collision modulo a hidden factor, the factor pops out."
The Setup
In Ch2 L1, you built a factorize function that runs in \(O(\sqrt{n})\). For \(n \leq 10^{12}\), that's about \(10^6\) iterations — fast. But what happens when a problem hands you \(n = 998244353 \times 1000000007 = 998244358306749571\)?
That's roughly \(10^{18}\). The square root is about \(10^9\). Trial division needs a billion iterations per query. At \(10^8\) operations per second, that's 10 seconds — way too slow.
And this isn't a contrived scenario. Problems on Codeforces regularly give \(n \leq 10^{18}\) and ask you to factorize it, count its divisors, or check if it's prime. The sieve from Ch2 L2 is useless here — you'd need a sieve up to \(10^9\), which blows both time and memory.
You need two new tools: - A primality test that runs in \(O(\log^2 n)\) — not \(O(\sqrt{n})\). - A factorization algorithm that runs in expected \(O(n^{1/4})\) — not \(O(n^{1/2})\).
For \(n = 10^{18}\), that's \(10^{4.5} \approx 31623\) operations instead of \(10^9\). The difference between instant and impossible.
Miller-Rabin: Fast Primality Testing
The Fermat Foundation
Fermat's Little Theorem says: if \(p\) is prime and \(\gcd(a, p) = 1\), then
So if you pick some \(a\) and compute \(a^{n-1} \bmod n\), and the result is NOT \(1\), then \(n\) is definitely composite. The number \(a\) is called a witness to \(n\)'s compositeness.
The problem: some composites pass this test for every \(a\). These are called Carmichael numbers — \(561 = 3 \times 11 \times 17\) is the smallest. For every \(a\) coprime to \(561\), we get \(a^{560} \equiv 1 \pmod{561}\). The plain Fermat test can't catch them.
The Miller-Rabin Upgrade
Miller-Rabin adds a twist that catches Carmichael numbers. The key insight comes from the SHAPE of how \(a^{n-1} \bmod n\) is computed.
Write \(n - 1 = 2^s \times d\) where \(d\) is odd. Instead of jumping straight to \(a^{n-1}\), compute \(a^d \bmod n\) first, then square it \(s\) times:
For a genuine prime \(p\), this squaring sequence has a very specific shape. Consider the equation \(x^2 \equiv 1 \pmod{p}\). This factors as \((x-1)(x+1) \equiv 0 \pmod{p}\), and since \(p\) is prime, one of the factors must be divisible by \(p\). So the only square roots of \(1\) modulo a prime are \(1\) and \(-1\) (i.e., \(p-1\)).
This means: in the squaring chain, once you see \(1\), the value right before it MUST have been \(\pm 1\). Working backward, either: - \(a^d \equiv 1 \pmod{n}\) (the chain starts at \(1\) and stays there), OR - \(a^{2^r \cdot d} \equiv -1 \pmod{n}\) for some \(0 \leq r < s\) (the chain hits \(-1\) at some point, then the next square gives \(1\)).
If neither condition holds, \(n\) is composite — guaranteed. No Carmichael number escapes, because the test exploits the structure of square roots modulo primes, not just the final value.
Worked Example: Test \(n = 561\) with Witness \(a = 2\)
\(n - 1 = 560 = 2^4 \times 35\), so \(s = 4\), \(d = 35\).
Compute the squaring chain:
| Step | Value | Result \(\bmod 561\) |
|---|---|---|
| \(2^{35}\) | start | \(263\) |
| \(2^{70}\) | square | \(263^2 \bmod 561 = 166\) |
| \(2^{140}\) | square | \(166^2 \bmod 561 = 67\) |
| \(2^{280}\) | square | \(67^2 \bmod 561 = 1\) |
| \(2^{560}\) | square | \(1^2 \bmod 561 = 1\) |
The final value is \(1\) — Fermat's test passes. But look at step \(2^{280}\): the value is \(67\), and the next square gives \(1\). That means \(67^2 \equiv 1 \pmod{561}\), but \(67 \neq 1\) and \(67 \neq 560\) (i.e., \(67 \neq \pm 1\)). A nontrivial square root of \(1\) exists modulo \(561\). This is impossible for a prime.
Miller-Rabin catches \(561\) where plain Fermat fails. The shape of the squaring chain reveals the composite nature.
How Many Witnesses Do You Need?
For randomized Miller-Rabin, each random witness independently has at most a \(1/4\) probability of failing to detect a composite. So \(k\) random witnesses give error probability \(\leq 4^{-k}\).
But in competitive programming, you want deterministic results. It turns out that for specific ranges of \(n\), fixed small witness sets suffice:
| Range | Witnesses needed |
|---|---|
| \(n < 2{,}047\) | \(\{2\}\) |
| \(n < 1{,}373{,}653\) | \(\{2, 3\}\) |
| \(n < 3{,}215{,}031{,}751\) | \(\{2, 3, 5, 7\}\) |
| \(n < 3.3 \times 10^{24}\) | \(\{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37\}\) |
That last line covers every value you'll ever see in competitive programming. Twelve witnesses, each requiring \(O(\log n)\) multiplications — total cost \(O(12 \log^2 n)\). For \(n = 10^{18}\), that's about \(60 \times 12 = 720\) modular multiplications. Essentially free.
Pollard's Rho: Fast Factorization
Miller-Rabin tells you WHETHER \(n\) is prime. If it's not, you need to FIND a factor. That's Pollard's Rho.
The Birthday Paradox Insight
Suppose \(n = p \times q\) and you don't know \(p\) or \(q\). Pick random numbers \(x_1, x_2, \ldots\) modulo \(n\) and check \(\gcd(|x_i - x_j|, n)\) for all pairs. If \(x_i \equiv x_j \pmod{p}\) but \(x_i \not\equiv x_j \pmod{n}\), then \(\gcd(|x_i - x_j|, n)\) is a nontrivial factor of \(n\).
By the birthday paradox, you need about \(O(\sqrt{p})\) random values before two of them collide modulo \(p\). Since \(p \leq \sqrt{n}\) (at least one factor is \(\leq \sqrt{n}\)), this takes about \(O(n^{1/4})\) samples.
But checking all pairs is \(O(k^2)\), which ruins the birthday advantage. You need a way to detect a collision without checking every pair.
Floyd's Cycle Detection — The Chain
Instead of picking independent random values, build a pseudorandom chain using the iteration \(f(x) = x^2 + c \bmod n\) for some constant \(c\). Starting from \(x_0\), generate:
This sequence is deterministic and eventually cycles (since there are finitely many residues modulo \(n\)). The shape is the Greek letter \(\rho\) — a tail leading into a loop. That's the name.
Now consider the same sequence modulo the hidden factor \(p\). Since \(f(x) = x^2 + c\) is a function modulo \(n\), it's also a function modulo \(p\) (just reduce everything modulo \(p\)). The sequence modulo \(p\) cycles faster — there are only \(p\) possible values instead of \(n\). By the birthday paradox, the cycle length modulo \(p\) is \(O(\sqrt{p})\).
Floyd's cycle detection uses two pointers: a tortoise moving one step at a time and a hare moving two steps. When the tortoise is at \(x_i\) and the hare is at \(x_{2i}\), check \(\gcd(|x_i - x_{2i}|, n)\). If \(x_i \equiv x_{2i} \pmod{p}\) (they've collided in the cycle modulo \(p\)), then \(p \mid |x_i - x_{2i}|\) and the GCD reveals a factor.
Worked Example: Factor \(n = 8051\)
Use \(f(x) = x^2 + 1 \bmod 8051\), starting at \(x_0 = 2\).
| Step | Tortoise (\(x_i\)) | Hare (\(x_{2i}\)) | \(|x_i - x_{2i}|\) | \(\gcd(\ldots, 8051)\) | |:-----|:------------------|:-----------------|:-------------------|:----------------------| | \(1\) | \(f(2) = 5\) | \(f(f(2)) = 26\) | \(21\) | \(\gcd(21, 8051) = 7\) |
One step. The GCD gives \(7\), and indeed \(8051 = 7 \times 1150 + 1\)... let's verify: \(7 \times 1150 = 8050\), so \(8051 = 7 \times 1150 + 1\). That's not right. Actually \(8051 / 7 = 1150.14...\), so \(7 \nmid 8051\). Let me retrace.
\(f(2) = 4 + 1 = 5\). \(f(5) = 25 + 1 = 26\). \(f(26) = 676 + 1 = 677\). After step 1: tortoise \(= 5\), hare \(= 26\). \(\gcd(21, 8051) = 1\). Continue.
| Step | Tortoise | Hare | \(\gcd(|\text{diff}|, 8051)\) | |:-----|:---------|:-----|:----------------------------| | \(1\) | \(5\) | \(26\) | \(\gcd(21, 8051) = 1\) | | \(2\) | \(26\) | \(7474\) | \(\gcd(7448, 8051) = 1\) | | \(3\) | \(677\) | \(871\) | \(\gcd(194, 8051) = 97\) |
At step 3, \(\gcd(194, 8051) = 97\), and \(8051 = 97 \times 83\). Found both factors.
The tortoise took 3 steps. Trial division would have needed to scan up to \(\sqrt{8051} \approx 90\) before finding \(83\) (or reaching \(97\)). For this tiny example the savings are modest, but for \(n = 10^{18}\) the difference is between \(10^{4.5}\) and \(10^9\) operations.
Why Might It Fail?
If the tortoise and hare collide modulo \(n\) (not just modulo \(p\)), then \(\gcd(|x_i - x_{2i}|, n) = n\) — a trivial factor. This happens when the cycle modulo \(p\) and the cycle modulo \(q\) have the same length and align perfectly. The fix: pick a different \(c\) and try again. In practice, you rarely need more than 2-3 attempts.
Expected Complexity
The cycle modulo the smallest prime factor \(p\) has expected length \(O(\sqrt{p})\). Floyd's detection finds the cycle in \(O(\sqrt{p})\) steps, each costing \(O(\log n)\) for the GCD. Since \(p \leq n^{1/2}\), this is \(O(n^{1/4} \log n)\).
For \(n = 10^{18}\), that's about \(10^{4.5} \times 60 \approx 2 \times 10^6\) — comfortably fast.
The __int128 Problem
When you compute \(a \times b \bmod n\) where \(a, b, n\) can be up to \(10^{18}\), the intermediate product \(a \times b\) can reach \(10^{36}\). That overflows unsigned long long (max \(\approx 1.8 \times 10^{19}\)). You need 128-bit multiplication.
GCC provides __int128 (and unsigned __int128) which handles this natively. The key helper:
typedef unsigned long long ull;
typedef __int128 u128;
ull mulmod(ull base, ull multiplier, ull modulus) {
return (u128)base * multiplier % modulus;
}
Cast one operand to u128, the multiplication promotes to 128 bits, then take the remainder. This is a single function call — no binary multiplication loops needed.
Use mulmod everywhere inside Miller-Rabin and Pollard's Rho. The modular exponentiation becomes:
ull powmod(ull base, ull exponent, ull modulus) {
ull result = 1;
base %= modulus;
while (exponent > 0) {
if (exponent & 1) result = mulmod(result, base, modulus);
base = mulmod(base, base, modulus);
exponent >>= 1;
}
return result;
}
Identical structure to the modpow from Ch4 L2, except every multiplication goes through mulmod to avoid overflow.
The Combined Template
The strategy for factorizing any \(n \leq 10^{18}\):
- Base cases: If \(n = 1\), return empty. If \(n\) is even, pull out all factors of \(2\).
- Primality check: Run
isPrime(n)(deterministic Miller-Rabin). If prime, record \(n\) and return. - Find a factor: Run
pollardRho(n)to get a nontrivial factor \(d\). - Recurse: Factorize both \(d\) and \(n/d\).
The recursion depth is at most \(O(\log n)\) (each split produces strictly smaller numbers), and each Pollard's Rho call is \(O(n^{1/4})\) in the worst case. The total work is dominated by the first split.
The Code
Full Implementation
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
typedef __int128 u128;
ull mulmod(ull base, ull multiplier, ull modulus) {
return (u128)base * multiplier % modulus;
}
ull powmod(ull base, ull exponent, ull modulus) {
ull result = 1;
base %= modulus;
while (exponent > 0) {
if (exponent & 1) result = mulmod(result, base, modulus);
base = mulmod(base, base, modulus);
exponent >>= 1;
}
return result;
}
bool millerRabin(ull number, ull witness) {
if (number % witness == 0) return number == witness;
ull oddPart = number - 1;
int twoPower = 0;
while (oddPart % 2 == 0) {
oddPart /= 2;
twoPower++;
}
ull current = powmod(witness, oddPart, number);
if (current == 1 || current == number - 1) return true;
for (int round = 0; round < twoPower - 1; round++) {
current = mulmod(current, current, number);
if (current == number - 1) return true;
}
return false;
}
bool isPrime(ull number) {
if (number < 2) return false;
for (ull witness : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}) {
if (!millerRabin(number, witness)) return false;
}
return true;
}
ull pollardRho(ull number) {
if (number % 2 == 0) return 2;
mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
while (true) {
ull offset = rng() % (number - 2) + 2;
ull tortoise = offset;
ull hare = offset;
ull divisor = 1;
while (divisor == 1) {
tortoise = mulmod(tortoise, tortoise, number) + offset;
if (tortoise >= number) tortoise -= number;
hare = mulmod(hare, hare, number) + offset;
if (hare >= number) hare -= number;
hare = mulmod(hare, hare, number) + offset;
if (hare >= number) hare -= number;
ull difference = tortoise > hare ? tortoise - hare : hare - tortoise;
divisor = __gcd(difference, number);
}
if (divisor != number) return divisor;
}
}
map<ull, int> factorize(ull number) {
map<ull, int> primeExponents;
if (number <= 1) return primeExponents;
queue<ull> workQueue;
workQueue.push(number);
while (!workQueue.empty()) {
ull current = workQueue.front();
workQueue.pop();
if (current == 1) continue;
if (isPrime(current)) {
primeExponents[current]++;
continue;
}
ull factor = pollardRho(current);
workQueue.push(factor);
workQueue.push(current / factor);
}
return primeExponents;
}
\(O(n^{1/4} \log n)\) expected time per factorization, \(O(\log n)\) space.
Walk through the code:
mulmod: 128-bit safe multiplication. Everya * b % min the entire template goes through this.powmod: Binary exponentiation. Nothing new except it callsmulmodinstead of using%directly.millerRabin: Single-witness test. Extracts the odd part \(d\) and the power of \(2\), computes \(a^d\), then squares up to \(s-1\) times looking for \(-1\). Returnstrueif \(n\) looks prime to this witness.isPrime: RunsmillerRabinfor all 12 deterministic witnesses. If ALL pass, \(n\) is prime (guaranteed for \(n < 3.3 \times 10^{24}\)).pollardRho: Floyd's cycle detection with random starting offset. Tries different offsets until a nontrivial GCD appears.factorize: BFS over the factor tree. Each composite gets split bypollardRho, each piece goes back into the queue. Primes get recorded. When the queue is empty, the full factorization is assembled.
Brent's Optimization (Optional Speedup)
The implementation above uses Floyd's classic two-pointer approach. Brent's improvement replaces this with a power-of-two stepping scheme that's about 24% faster in practice.
The idea: instead of advancing tortoise and hare simultaneously, keep the tortoise fixed and let the hare run for \(2^k\) steps at a time. After each power-of-two batch, move the tortoise to the hare's position and double the batch size.
An additional trick: accumulate the product of all \(|x_i - x_j|\) values and take a single GCD every (say) 128 steps. GCD is expensive relative to multiplication, so batching reduces the constant factor significantly.
In the template above, the basic Floyd version is already fast enough for all competitive programming purposes. Brent's is worth knowing if you hit tight time limits — swap out the inner loop and batch the GCDs.
Common Mistakes
-
Forgetting the
number == witnesscheck inmillerRabin. If \(n = 2\) and the witness is \(2\), then \(n \% 2 = 0\) — but \(n\) IS prime. Without the guardreturn number == witness, you'd declare \(2\) composite. The same applies to witnesses \(3, 5, 7, \ldots\) when tested against themselves. -
Overflow in
tortoise + offset. After computingmulmod(tortoise, tortoise, number), the result is in \([0, n)\). Addingoffset(also in \([0, n)\)) can reach \(2n - 2\), which fits inullsince \(n \leq 10^{18}\). But if you skip theif (tortoise >= number) tortoise -= numberadjustment, the tortoise drifts outside \([0, n)\) and the algorithm breaks silently. -
Using
offset = 0oroffset = 1in Pollard's Rho. With \(c = 0\), the map \(x \to x^2 \bmod n\) has fixed points at \(0\) and \(1\), so the chain degenerates. Random \(c \geq 2\) avoids these. The implementation draws from \([2, n)\). -
Not handling
pollardRhoreturning a non-prime.pollardRho(n)returns A factor, not necessarily a PRIME factor. You must recurse (or use the BFS queue) to split the factor further. Thefactorizefunction handles this — it only records a value whenisPrimeconfirms it.
Quick Recap
- Trial division is \(O(\sqrt{n})\) — fine for \(n \leq 10^{12}\), hopeless for \(n \leq 10^{18}\).
- Miller-Rabin tests primality in \(O(\log^2 n)\) by checking whether the squaring chain \(a^d, a^{2d}, \ldots, a^{n-1}\) has the shape required by a prime. With witnesses \(\{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37\}\), it's deterministic for all \(n < 3.3 \times 10^{24}\).
- Pollard's Rho finds a factor in expected \(O(n^{1/4})\) using Floyd's cycle detection on the pseudorandom chain \(x \to x^2 + c \bmod n\). When \(\gcd(|x_i - x_{2i}|, n) \notin \{1, n\}\), you've found a nontrivial factor.
__int128is essential for modular multiplication when \(n\) exceeds \(\sim 10^{9.5}\). Without it, \(a \times b\) overflowsull.- The template:
isPrimeto check,pollardRhoto split, BFS to assemble the full factorization. Handles any \(n \leq 10^{18}\).
Problems
| Problem | Link | Difficulty | Focus |
|---|---|---|---|
| CF 1514C — Product 1 Modulo N | codeforces.com/contest/1514/problem/C | 1600 | Primality and coprimality reasoning |
| CF 1445C — Division | codeforces.com/contest/1445/problem/C | 1500 | Factor large \(n\), find divisor satisfying constraint |
| CF 1549D — Integers Have Friends | codeforces.com/contest/1549/problem/D | 1800 | GCD on segments, large number handling |
| SPOJ — FACT0 (Integer Factorization) | spoj.com/problems/FACT0 | Hard | Direct application: factorize \(n \leq 10^{18}\) |
| CF 1468J — Road Reform | codeforces.com/contest/1468/problem/J | 1800 | MST + large value processing |
| Library Checker — Factorize | judge.yosupo.jp/problem/factorize | Hard | Stress test: factorize up to \(10^{18}\), tight TL |
This completes the Multiplicative Functions and Mobius chapter. You now have the full toolkit: divisor sums and swap tricks (L1), multiplicative functions and linear sieves (L2), Mobius inversion (L3), floor staircase optimization (L4), and large-number primality/factorization (L5).