Large Numbers: Miller-Rabin and Pollard's Rho

Pillar: Shape — "Miller-Rabin: a composite \(n\) MUST fail the Fermat test for some witness. The SHAPE of the failure pattern — whether the squaring sequence hits \(-1\) before reaching \(1\) — distinguishes primes from composites."

Pillar: Chain — "Pollard's Rho: Floyd's cycle detection on \(x \to x^2 + c \bmod n\) — a CHAIN that eventually cycles. When the cycle reveals a collision modulo a hidden factor, the factor pops out."

The Setup

In Ch2 L1, you built a factorize function that runs in \(O(\sqrt{n})\). For \(n \leq 10^{12}\), that's about \(10^6\) iterations — fast. But what happens when a problem hands you \(n = 998244353 \times 1000000007 = 998244358306749571\)?

That's roughly \(10^{18}\). The square root is about \(10^9\). Trial division needs a billion iterations per query. At \(10^8\) operations per second, that's 10 seconds — way too slow.

And this isn't a contrived scenario. Problems on Codeforces regularly give \(n \leq 10^{18}\) and ask you to factorize it, count its divisors, or check if it's prime. The sieve from Ch2 L2 is useless here — you'd need a sieve up to \(10^9\), which blows both time and memory.

You need two new tools: - A primality test that runs in \(O(\log^2 n)\) — not \(O(\sqrt{n})\). - A factorization algorithm that runs in expected \(O(n^{1/4})\) — not \(O(n^{1/2})\).

For \(n = 10^{18}\), that's \(10^{4.5} \approx 31623\) operations instead of \(10^9\). The difference between instant and impossible.

Miller-Rabin: Fast Primality Testing

The Fermat Foundation

Fermat's Little Theorem says: if \(p\) is prime and \(\gcd(a, p) = 1\), then

\[a^{p-1} \equiv 1 \pmod{p}\]

So if you pick some \(a\) and compute \(a^{n-1} \bmod n\), and the result is NOT \(1\), then \(n\) is definitely composite. The number \(a\) is called a witness to \(n\)'s compositeness.

The problem: some composites pass this test for every \(a\). These are called Carmichael numbers — \(561 = 3 \times 11 \times 17\) is the smallest. For every \(a\) coprime to \(561\), we get \(a^{560} \equiv 1 \pmod{561}\). The plain Fermat test can't catch them.

The Miller-Rabin Upgrade

Miller-Rabin adds a twist that catches Carmichael numbers. The key insight comes from the SHAPE of how \(a^{n-1} \bmod n\) is computed.

Write \(n - 1 = 2^s \times d\) where \(d\) is odd. Instead of jumping straight to \(a^{n-1}\), compute \(a^d \bmod n\) first, then square it \(s\) times:

\[a^d,\; a^{2d},\; a^{4d},\; \ldots,\; a^{2^s \cdot d} = a^{n-1}\]

For a genuine prime \(p\), this squaring sequence has a very specific shape. Consider the equation \(x^2 \equiv 1 \pmod{p}\). This factors as \((x-1)(x+1) \equiv 0 \pmod{p}\), and since \(p\) is prime, one of the factors must be divisible by \(p\). So the only square roots of \(1\) modulo a prime are \(1\) and \(-1\) (i.e., \(p-1\)).

This means: in the squaring chain, once you see \(1\), the value right before it MUST have been \(\pm 1\). Working backward, either: - \(a^d \equiv 1 \pmod{n}\) (the chain starts at \(1\) and stays there), OR - \(a^{2^r \cdot d} \equiv -1 \pmod{n}\) for some \(0 \leq r < s\) (the chain hits \(-1\) at some point, then the next square gives \(1\)).

If neither condition holds, \(n\) is composite — guaranteed. No Carmichael number escapes, because the test exploits the structure of square roots modulo primes, not just the final value.

Worked Example: Test \(n = 561\) with Witness \(a = 2\)

\(n - 1 = 560 = 2^4 \times 35\), so \(s = 4\), \(d = 35\).

Compute the squaring chain:

Step	Value	Result \(\bmod 561\)
\(2^{35}\)	start	\(263\)
\(2^{70}\)	square	\(263^2 \bmod 561 = 166\)
\(2^{140}\)	square	\(166^2 \bmod 561 = 67\)
\(2^{280}\)	square	\(67^2 \bmod 561 = 1\)
\(2^{560}\)	square	\(1^2 \bmod 561 = 1\)

The final value is \(1\) — Fermat's test passes. But look at step \(2^{280}\): the value is \(67\), and the next square gives \(1\). That means \(67^2 \equiv 1 \pmod{561}\), but \(67 \neq 1\) and \(67 \neq 560\) (i.e., \(67 \neq \pm 1\)). A nontrivial square root of \(1\) exists modulo \(561\). This is impossible for a prime.

Miller-Rabin catches \(561\) where plain Fermat fails. The shape of the squaring chain reveals the composite nature.

How Many Witnesses Do You Need?

For randomized Miller-Rabin, each random witness independently has at most a \(1/4\) probability of failing to detect a composite. So \(k\) random witnesses give error probability \(\leq 4^{-k}\).

But in competitive programming, you want deterministic results. It turns out that for specific ranges of \(n\), fixed small witness sets suffice:

Range	Witnesses needed
\(n < 2{,}047\)	\(\{2\}\)
\(n < 1{,}373{,}653\)	\(\{2, 3\}\)
\(n < 3{,}215{,}031{,}751\)	\(\{2, 3, 5, 7\}\)
\(n < 3.3 \times 10^{24}\)	\(\{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37\}\)

That last line covers every value you'll ever see in competitive programming. Twelve witnesses, each requiring \(O(\log n)\) multiplications — total cost \(O(12 \log^2 n)\). For \(n = 10^{18}\), that's about \(60 \times 12 = 720\) modular multiplications. Essentially free.

Pollard's Rho: Fast Factorization

Miller-Rabin tells you WHETHER \(n\) is prime. If it's not, you need to FIND a factor. That's Pollard's Rho.

The Birthday Paradox Insight

Suppose \(n = p \times q\) and you don't know \(p\) or \(q\). Pick random numbers \(x_1, x_2, \ldots\) modulo \(n\) and check \(\gcd(|x_i - x_j|, n)\) for all pairs. If \(x_i \equiv x_j \pmod{p}\) but \(x_i \not\equiv x_j \pmod{n}\), then \(\gcd(|x_i - x_j|, n)\) is a nontrivial factor of \(n\).

By the birthday paradox, you need about \(O(\sqrt{p})\) random values before two of them collide modulo \(p\). Since \(p \leq \sqrt{n}\) (at least one factor is \(\leq \sqrt{n}\)), this takes about \(O(n^{1/4})\) samples.

But checking all pairs is \(O(k^2)\), which ruins the birthday advantage. You need a way to detect a collision without checking every pair.

Floyd's Cycle Detection — The Chain

Instead of picking independent random values, build a pseudorandom chain using the iteration \(f(x) = x^2 + c \bmod n\) for some constant \(c\). Starting from \(x_0\), generate:

\[x_0 \to x_1 = f(x_0) \to x_2 = f(x_1) \to x_3 = f(x_2) \to \cdots\]

This sequence is deterministic and eventually cycles (since there are finitely many residues modulo \(n\)). The shape is the Greek letter \(\rho\) — a tail leading into a loop. That's the name.

Now consider the same sequence modulo the hidden factor \(p\). Since \(f(x) = x^2 + c\) is a function modulo \(n\), it's also a function modulo \(p\) (just reduce everything modulo \(p\)). The sequence modulo \(p\) cycles faster — there are only \(p\) possible values instead of \(n\). By the birthday paradox, the cycle length modulo \(p\) is \(O(\sqrt{p})\).

Floyd's cycle detection uses two pointers: a tortoise moving one step at a time and a hare moving two steps. When the tortoise is at \(x_i\) and the hare is at \(x_{2i}\), check \(\gcd(|x_i - x_{2i}|, n)\). If \(x_i \equiv x_{2i} \pmod{p}\) (they've collided in the cycle modulo \(p\)), then \(p \mid |x_i - x_{2i}|\) and the GCD reveals a factor.

Worked Example: Factor \(n = 8051\)

Use \(f(x) = x^2 + 1 \bmod 8051\), starting at \(x_0 = 2\).

| Step | Tortoise (\(x_i\)) | Hare (\(x_{2i}\)) | \(|x_i - x_{2i}|\) | \(\gcd(\ldots, 8051)\) | |:-----|:------------------|:-----------------|:-------------------|:----------------------| | \(1\) | \(f(2) = 5\) | \(f(f(2)) = 26\) | \(21\) | \(\gcd(21, 8051) = 7\) |

One step. The GCD gives \(7\), and indeed \(8051 = 7 \times 1150 + 1\)... let's verify: \(7 \times 1150 = 8050\), so \(8051 = 7 \times 1150 + 1\). That's not right. Actually \(8051 / 7 = 1150.14...\), so \(7 \nmid 8051\). Let me retrace.

\(f(2) = 4 + 1 = 5\). \(f(5) = 25 + 1 = 26\). \(f(26) = 676 + 1 = 677\). After step 1: tortoise \(= 5\), hare \(= 26\). \(\gcd(21, 8051) = 1\). Continue.

| Step | Tortoise | Hare | \(\gcd(|\text{diff}|, 8051)\) | |:-----|:---------|:-----|:----------------------------| | \(1\) | \(5\) | \(26\) | \(\gcd(21, 8051) = 1\) | | \(2\) | \(26\) | \(7474\) | \(\gcd(7448, 8051) = 1\) | | \(3\) | \(677\) | \(871\) | \(\gcd(194, 8051) = 97\) |

At step 3, \(\gcd(194, 8051) = 97\), and \(8051 = 97 \times 83\). Found both factors.

The tortoise took 3 steps. Trial division would have needed to scan up to \(\sqrt{8051} \approx 90\) before finding \(83\) (or reaching \(97\)). For this tiny example the savings are modest, but for \(n = 10^{18}\) the difference is between \(10^{4.5}\) and \(10^9\) operations.

Why Might It Fail?

If the tortoise and hare collide modulo \(n\) (not just modulo \(p\)), then \(\gcd(|x_i - x_{2i}|, n) = n\) — a trivial factor. This happens when the cycle modulo \(p\) and the cycle modulo \(q\) have the same length and align perfectly. The fix: pick a different \(c\) and try again. In practice, you rarely need more than 2-3 attempts.

Expected Complexity

The cycle modulo the smallest prime factor \(p\) has expected length \(O(\sqrt{p})\). Floyd's detection finds the cycle in \(O(\sqrt{p})\) steps, each costing \(O(\log n)\) for the GCD. Since \(p \leq n^{1/2}\), this is \(O(n^{1/4} \log n)\).

For \(n = 10^{18}\), that's about \(10^{4.5} \times 60 \approx 2 \times 10^6\) — comfortably fast.

The `__int128` Problem

When you compute \(a \times b \bmod n\) where \(a, b, n\) can be up to \(10^{18}\), the intermediate product \(a \times b\) can reach \(10^{36}\). That overflows unsigned long long (max \(\approx 1.8 \times 10^{19}\)). You need 128-bit multiplication.

GCC provides __int128 (and unsigned __int128) which handles this natively. The key helper:

typedef unsigned long long ull;
typedef __int128 u128;

ull mulmod(ull base, ull multiplier, ull modulus) {
    return (u128)base * multiplier % modulus;
}

Cast one operand to u128, the multiplication promotes to 128 bits, then take the remainder. This is a single function call — no binary multiplication loops needed.

Use mulmod everywhere inside Miller-Rabin and Pollard's Rho. The modular exponentiation becomes:

ull powmod(ull base, ull exponent, ull modulus) {
    ull result = 1;
    base %= modulus;
    while (exponent > 0) {
        if (exponent & 1) result = mulmod(result, base, modulus);
        base = mulmod(base, base, modulus);
        exponent >>= 1;
    }
    return result;
}

Identical structure to the modpow from Ch4 L2, except every multiplication goes through mulmod to avoid overflow.

The Combined Template

The strategy for factorizing any \(n \leq 10^{18}\):

Base cases: If \(n = 1\), return empty. If \(n\) is even, pull out all factors of \(2\).
Primality check: Run isPrime(n) (deterministic Miller-Rabin). If prime, record \(n\) and return.
Find a factor: Run pollardRho(n) to get a nontrivial factor \(d\).
Recurse: Factorize both \(d\) and \(n/d\).

The recursion depth is at most \(O(\log n)\) (each split produces strictly smaller numbers), and each Pollard's Rho call is \(O(n^{1/4})\) in the worst case. The total work is dominated by the first split.

The Code

Full Implementation

#include <bits/stdc++.h>
using namespace std;

typedef unsigned long long ull;
typedef __int128 u128;

ull mulmod(ull base, ull multiplier, ull modulus) {
    return (u128)base * multiplier % modulus;
}

ull powmod(ull base, ull exponent, ull modulus) {
    ull result = 1;
    base %= modulus;
    while (exponent > 0) {
        if (exponent & 1) result = mulmod(result, base, modulus);
        base = mulmod(base, base, modulus);
        exponent >>= 1;
    }
    return result;
}

bool millerRabin(ull number, ull witness) {
    if (number % witness == 0) return number == witness;
    ull oddPart = number - 1;
    int twoPower = 0;
    while (oddPart % 2 == 0) {
        oddPart /= 2;
        twoPower++;
    }
    ull current = powmod(witness, oddPart, number);
    if (current == 1 || current == number - 1) return true;
    for (int round = 0; round < twoPower - 1; round++) {
        current = mulmod(current, current, number);
        if (current == number - 1) return true;
    }
    return false;
}

bool isPrime(ull number) {
    if (number < 2) return false;
    for (ull witness : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}) {
        if (!millerRabin(number, witness)) return false;
    }
    return true;
}

ull pollardRho(ull number) {
    if (number % 2 == 0) return 2;
    mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
    while (true) {
        ull offset = rng() % (number - 2) + 2;
        ull tortoise = offset;
        ull hare = offset;
        ull divisor = 1;
        while (divisor == 1) {
            tortoise = mulmod(tortoise, tortoise, number) + offset;
            if (tortoise >= number) tortoise -= number;
            hare = mulmod(hare, hare, number) + offset;
            if (hare >= number) hare -= number;
            hare = mulmod(hare, hare, number) + offset;
            if (hare >= number) hare -= number;
            ull difference = tortoise > hare ? tortoise - hare : hare - tortoise;
            divisor = __gcd(difference, number);
        }
        if (divisor != number) return divisor;
    }
}

map<ull, int> factorize(ull number) {
    map<ull, int> primeExponents;
    if (number <= 1) return primeExponents;
    queue<ull> workQueue;
    workQueue.push(number);
    while (!workQueue.empty()) {
        ull current = workQueue.front();
        workQueue.pop();
        if (current == 1) continue;
        if (isPrime(current)) {
            primeExponents[current]++;
            continue;
        }
        ull factor = pollardRho(current);
        workQueue.push(factor);
        workQueue.push(current / factor);
    }
    return primeExponents;
}

\(O(n^{1/4} \log n)\) expected time per factorization, \(O(\log n)\) space.

Walk through the code:

mulmod: 128-bit safe multiplication. Every a * b % m in the entire template goes through this.
powmod: Binary exponentiation. Nothing new except it calls mulmod instead of using % directly.
millerRabin: Single-witness test. Extracts the odd part \(d\) and the power of \(2\), computes \(a^d\), then squares up to \(s-1\) times looking for \(-1\). Returns true if \(n\) looks prime to this witness.
isPrime: Runs millerRabin for all 12 deterministic witnesses. If ALL pass, \(n\) is prime (guaranteed for \(n < 3.3 \times 10^{24}\)).
pollardRho: Floyd's cycle detection with random starting offset. Tries different offsets until a nontrivial GCD appears.
factorize: BFS over the factor tree. Each composite gets split by pollardRho, each piece goes back into the queue. Primes get recorded. When the queue is empty, the full factorization is assembled.

Brent's Optimization (Optional Speedup)

The implementation above uses Floyd's classic two-pointer approach. Brent's improvement replaces this with a power-of-two stepping scheme that's about 24% faster in practice.

The idea: instead of advancing tortoise and hare simultaneously, keep the tortoise fixed and let the hare run for \(2^k\) steps at a time. After each power-of-two batch, move the tortoise to the hare's position and double the batch size.

An additional trick: accumulate the product of all \(|x_i - x_j|\) values and take a single GCD every (say) 128 steps. GCD is expensive relative to multiplication, so batching reduces the constant factor significantly.

In the template above, the basic Floyd version is already fast enough for all competitive programming purposes. Brent's is worth knowing if you hit tight time limits — swap out the inner loop and batch the GCDs.

Common Mistakes

Forgetting the number == witness check in millerRabin. If \(n = 2\) and the witness is \(2\), then \(n \% 2 = 0\) — but \(n\) IS prime. Without the guard return number == witness, you'd declare \(2\) composite. The same applies to witnesses \(3, 5, 7, \ldots\) when tested against themselves.
Overflow in tortoise + offset. After computing mulmod(tortoise, tortoise, number), the result is in \([0, n)\). Adding offset (also in \([0, n)\)) can reach \(2n - 2\), which fits in ull since \(n \leq 10^{18}\). But if you skip the if (tortoise >= number) tortoise -= number adjustment, the tortoise drifts outside \([0, n)\) and the algorithm breaks silently.
Using offset = 0 or offset = 1 in Pollard's Rho. With \(c = 0\), the map \(x \to x^2 \bmod n\) has fixed points at \(0\) and \(1\), so the chain degenerates. Random \(c \geq 2\) avoids these. The implementation draws from \([2, n)\).
Not handling pollardRho returning a non-prime. pollardRho(n) returns A factor, not necessarily a PRIME factor. You must recurse (or use the BFS queue) to split the factor further. The factorize function handles this — it only records a value when isPrime confirms it.

Quick Recap

Trial division is \(O(\sqrt{n})\) — fine for \(n \leq 10^{12}\), hopeless for \(n \leq 10^{18}\).
Miller-Rabin tests primality in \(O(\log^2 n)\) by checking whether the squaring chain \(a^d, a^{2d}, \ldots, a^{n-1}\) has the shape required by a prime. With witnesses \(\{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37\}\), it's deterministic for all \(n < 3.3 \times 10^{24}\).
Pollard's Rho finds a factor in expected \(O(n^{1/4})\) using Floyd's cycle detection on the pseudorandom chain \(x \to x^2 + c \bmod n\). When \(\gcd(|x_i - x_{2i}|, n) \notin \{1, n\}\), you've found a nontrivial factor.
__int128 is essential for modular multiplication when \(n\) exceeds \(\sim 10^{9.5}\). Without it, \(a \times b\) overflows ull.
The template: isPrime to check, pollardRho to split, BFS to assemble the full factorization. Handles any \(n \leq 10^{18}\).

Problems

Problem	Link	Difficulty	Focus
CF 1514C — Product 1 Modulo N	codeforces.com/contest/1514/problem/C	1600	Primality and coprimality reasoning
CF 1445C — Division	codeforces.com/contest/1445/problem/C	1500	Factor large \(n\), find divisor satisfying constraint
CF 1549D — Integers Have Friends	codeforces.com/contest/1549/problem/D	1800	GCD on segments, large number handling
SPOJ — FACT0 (Integer Factorization)	spoj.com/problems/FACT0	Hard	Direct application: factorize \(n \leq 10^{18}\)
CF 1468J — Road Reform	codeforces.com/contest/1468/problem/J	1800	MST + large value processing
Library Checker — Factorize	judge.yosupo.jp/problem/factorize	Hard	Stress test: factorize up to \(10^{18}\), tight TL

This completes the Multiplicative Functions and Mobius chapter. You now have the full toolkit: divisor sums and swap tricks (L1), multiplicative functions and linear sieves (L2), Mobius inversion (L3), floor staircase optimization (L4), and large-number primality/factorization (L5).