FWHT and Bitwise Convolution

Pillar: SET --- "FWHT operates on arrays indexed by bitmasks --- each index IS a set, and the convolution combines sets through bitwise operations."

The Setup

NTT (Ch11 L4) handles additive convolution --- polynomial multiplication where indices add: \(C[k] = \sum_{i+j=k} A[i] \cdot B[j]\).

But many CP problems need bitwise convolution, where indices combine through XOR, OR, or AND:

\[C[k] = \sum_{i \oplus j = k} A[i] \cdot B[j] \qquad\text{(XOR convolution)}\]

\[C[k] = \sum_{i \mathbin{|} j = k} A[i] \cdot B[j] \qquad\text{(OR convolution)}\]

\[C[k] = \sum_{i \mathbin{\&} j = k} A[i] \cdot B[j] \qquad\text{(AND convolution)}\]

The brute-force nested loop for any of these is \(O(M^2)\). The Fast Walsh-Hadamard Transform (FWHT) does each in \(O(M \log M)\), using the same three-step strategy as FFT and NTT: forward transform, pointwise multiply, inverse transform.

The difference is in the butterfly operation inside the transform. NTT uses roots of unity. FWHT uses simple additions and subtractions determined by which bitwise operation you need.

The Three Butterfly Operations

FWHT is a divide-and-conquer algorithm. At each level, it splits the array into a left half \(L\) and a right half \(R\), recursively transforms both, then merges them with a butterfly. The butterfly depends on the bitwise operation:

XOR Convolution

\[\text{Forward:} \quad L' = L + R, \quad R' = L - R\]

\[\text{Inverse:} \quad \text{same butterfly, then divide everything by } M\]

The XOR transform matrix is its own inverse up to a factor of \(M\), so the inverse transform applies the same butterfly and divides by \(M\) at the end.

OR Convolution

\[\text{Forward:} \quad L' = L, \quad R' = L + R\]

\[\text{Inverse:} \quad L = L', \quad R = R' - L'\]

AND Convolution

\[\text{Forward:} \quad L' = L + R, \quad R' = R\]

\[\text{Inverse:} \quad L = L' - R', \quad R = R'\]

Worked Example: XOR Convolution

\(A = [1, 2, 3, 4]\), \(B = [5, 6, 7, 8]\), \(M = 4\).

Forward transform of \(A\):

Level 1 (\(\text{step} = 1\)): process pairs \((L, R) \to (L+R, L-R)\).

Pair \((A[0], A[1]) = (1, 2) \to (3, -1)\)
Pair \((A[2], A[3]) = (3, 4) \to (7, -1)\)

Level 2 (\(\text{step} = 2\)): process pairs across the two halves.

Pair \((3, 7) \to (10, -4)\)
Pair \((-1, -1) \to (-2, 0)\)

\[A' = [10, -2, -4, 0]\]

Forward transform of \(B\):

Level 1: \((5, 6) \to (11, -1)\), \((7, 8) \to (15, -1)\).

Level 2: \((11, 15) \to (26, -4)\), \((-1, -1) \to (-2, 0)\).

\[B' = [26, -2, -4, 0]\]

Pointwise multiply:

\[C' = [10 \cdot 26,\; (-2)(-2),\; (-4)(-4),\; 0 \cdot 0] = [260, 4, 16, 0]\]

Inverse transform of \(C'\): Apply the same butterfly, then divide by \(M = 4\).

Level 1: \((260, 4) \to (264, 256)\), \((16, 0) \to (16, 16)\).

Level 2: \((264, 16) \to (280, 248)\), \((256, 16) \to (272, 240)\).

Divide by \(4\):

\[C = [70, 68, 62, 60]\]

Verify \(C[0]\): \(i \oplus j = 0\) means \(i = j\).

\[C[0] = 1 \cdot 5 + 2 \cdot 6 + 3 \cdot 7 + 4 \cdot 8 = 5 + 12 + 21 + 32 = 70 \;\checkmark\]

Relationship to SOS DP

The OR forward transform is exactly Sum Over Subsets (SOS) DP:

\[G[\text{mask}] = \sum_{\text{sub} \subseteq \text{mask}} A[\text{sub}]\]

And the OR inverse transform is Mobius inversion:

\[A[\text{mask}] = \sum_{\text{sub} \subseteq \text{mask}} (-1)^{\text{popcount}(\text{mask}) - \text{popcount}(\text{sub})} \cdot G[\text{sub}]\]

When you see "run SOS DP then inclusion-exclusion," you're doing OR FWHT manually --- the forward transform accumulates subset sums, and the inverse transform undoes them.

This connects back to the Mobius inversion you saw in Ch9. The set-based Mobius function over the lattice of subsets is \(\mu(S, T) = (-1)^{|T| - |S|}\) when \(S \subseteq T\), matching the sign pattern in the inverse OR transform.

Implementation

All three FWHT variants follow the same structure. The only difference is the butterfly operation inside the innermost loop.

XOR convolution:

void fwhtXor(vector<long long>& values, bool inverse) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                long long left = values[blockStart + offset];
                long long right = values[blockStart + offset + halfBlock];
                values[blockStart + offset] = left + right;
                values[blockStart + offset + halfBlock] = left - right;
            }
        }
    }
    if (inverse) {
        for (long long& value : values) value /= length;
    }
}

OR convolution:

void fwhtOr(vector<long long>& values, bool inverse) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                if (!inverse)
                    values[blockStart + offset + halfBlock] += values[blockStart + offset];
                else
                    values[blockStart + offset + halfBlock] -= values[blockStart + offset];
            }
        }
    }
}

AND convolution:

void fwhtAnd(vector<long long>& values, bool inverse) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                if (!inverse)
                    values[blockStart + offset] += values[blockStart + offset + halfBlock];
                else
                    values[blockStart + offset] -= values[blockStart + offset + halfBlock];
            }
        }
    }
}

Modular Versions

For problems requiring answers modulo a prime, replace plain addition/subtraction with modular operations. The XOR inverse also needs a modular division (multiply by the modular inverse of the length).

void fwhtXorMod(vector<long long>& values, bool inverse, long long mod) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                long long left = values[blockStart + offset];
                long long right = values[blockStart + offset + halfBlock];
                values[blockStart + offset] = (left + right) % mod;
                values[blockStart + offset + halfBlock] = (left - right + mod) % mod;
            }
        }
    }
    if (inverse) {
        long long inverseLength = modpow(length, mod - 2, mod);
        for (long long& value : values) value = value * inverseLength % mod;
    }
}

void fwhtOrMod(vector<long long>& values, bool inverse, long long mod) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                if (!inverse)
                    values[blockStart + offset + halfBlock] = (values[blockStart + offset + halfBlock] + values[blockStart + offset]) % mod;
                else
                    values[blockStart + offset + halfBlock] = (values[blockStart + offset + halfBlock] - values[blockStart + offset] + mod) % mod;
            }
        }
    }
}

void fwhtAndMod(vector<long long>& values, bool inverse, long long mod) {
    int length = values.size();
    for (int halfBlock = 1; halfBlock < length; halfBlock *= 2) {
        for (int blockStart = 0; blockStart < length; blockStart += 2 * halfBlock) {
            for (int offset = 0; offset < halfBlock; offset++) {
                if (!inverse)
                    values[blockStart + offset] = (values[blockStart + offset] + values[blockStart + offset + halfBlock]) % mod;
                else
                    values[blockStart + offset] = (values[blockStart + offset] - values[blockStart + offset + halfBlock] + mod) % mod;
            }
        }
    }
}

Limitations

FWHT requires arrays of size \(2^B\). If \(B > 25\), memory becomes an issue (\(2^{25} \approx 33\)M entries).

FWHT only works with operations that have an algebraic inverse. Addition and subtraction are invertible, so sum-product convolutions work. But \(\min\) and \(\max\) have no inverse --- you can't "undo" a \(\min\) --- so problems requiring \(\min\)-convolution or \(\max\)-convolution cannot use FWHT.

Common Mistakes

Confusing the butterfly for XOR, OR, and AND. Each bitwise operation has a different merge rule. Using the XOR butterfly for an OR convolution gives wrong answers silently --- no runtime error, just wrong output.
Forgetting to divide by \(M\) in XOR inverse. The XOR forward and inverse transforms use the same butterfly, but the inverse must divide every element by \(M\) at the end. Omitting this division gives values that are \(M\) times too large.
Using FWHT on non-power-of-2 arrays. FWHT requires array size exactly \(2^B\). If your values range up to some arbitrary \(M\), pad with zeros to the next power of \(2\) before transforming.
Using FWHT for min/max convolution. There is no FWHT for \(\min\)-convolution or \(\max\)-convolution because \(\min\) has no algebraic inverse.

Quick Recap

FWHT is the bitwise analogue of FFT/NTT: it computes XOR, OR, and AND convolutions in \(O(M \log M)\).
The three-step pattern is the same: forward transform, pointwise multiply, inverse transform.
XOR butterfly: \((L, R) \to (L+R, L-R)\). OR butterfly: \((L, R) \to (L, L+R)\). AND butterfly: \((L, R) \to (L+R, R)\).
OR forward transform = SOS DP. OR inverse = Mobius inversion over subsets.
FWHT connects back to the Mobius inversion from Ch9 and the set-theoretic view from Ch1.