The Floor Staircase
Pillar: Shape — "\(\lfloor n/i \rfloor\) takes at most \(2\sqrt{n}\) distinct values — it's a staircase that flattens quickly. Process each flat step as a block."
The Setup
You need to compute \(\sum_{i=1}^{n} \lfloor n/i \rfloor\) for \(n = 10^{12}\).
A naive loop iterates \(10^{12}\) times. That's hopeless. But try printing \(\lfloor 20/i \rfloor\) for \(i = 1, 2, \ldots, 20\):
Notice the pattern. The values drop fast at first, then the same value repeats for long stretches. From \(i = 8\) to \(i = 10\), the floor is stuck at \(2\). From \(i = 11\) to \(i = 20\), it's stuck at \(1\). The sequence looks like a staircase — steep steps near the beginning, wide flat steps near the end.
If you could identify each flat step and process the entire step at once, you'd skip the repeated values entirely. The question is: how many steps are there?
The answer is surprisingly small. And once you see the structure, every "sum over \(\lfloor n/d \rfloor\)" problem — counting divisors, summing divisors, Mobius inversion — collapses to the same \(O(\sqrt{n})\) template.
The Key Observation
\(\lfloor n/i \rfloor\) takes at most \(2\sqrt{n}\) distinct values as \(i\) ranges over \([1, n]\).
This is the shape insight. The function isn't smooth — it's a staircase, and the staircase has \(O(\sqrt{n})\) steps, not \(O(n)\). Any sum over \(\lfloor n/i \rfloor\) that you can evaluate per-block reduces from \(O(n)\) to \(O(\sqrt{n})\).
For \(n = 10^{12}\), that's about \(2 \times 10^6\) steps instead of \(10^{12}\) iterations.
The Proof
Split \([1, n]\) into two ranges based on \(\sqrt{n}\).
Case 1: \(i \leq \sqrt{n}\). There are at most \(\lfloor\sqrt{n}\rfloor\) values of \(i\) in this range. Each value of \(i\) produces some value of \(\lfloor n/i \rfloor\), so there are at most \(\sqrt{n}\) distinct floor values from this range.
Case 2: \(i > \sqrt{n}\). When \(i > \sqrt{n}\), the floor value satisfies \(\lfloor n/i \rfloor \leq n/i < n/\sqrt{n} = \sqrt{n}\). So \(\lfloor n/i \rfloor\) is a non-negative integer strictly less than \(\sqrt{n}\). There are at most \(\lfloor\sqrt{n}\rfloor\) such integers, so at most \(\sqrt{n}\) distinct values from this range.
Combining both cases: the total number of distinct values of \(\lfloor n/i \rfloor\) across all \(i \in [1, n]\) is at most \(\sqrt{n} + \sqrt{n} = 2\sqrt{n}\).
Note that the bound is tight up to a constant. For \(n = 20\), we saw \(8\) distinct values, and \(2\sqrt{20} \approx 8.9\).
That's the entire proof. Short, clean, and it gives you an exact bound you can rely on for complexity analysis.
Why does this matter beyond a single sum? The floor staircase shows up whenever you swap summation order in a divisor sum. In Ch9 L1 you saw that \(\sum_{k=1}^{n} \tau(k) = \sum_{d=1}^{n} \lfloor n/d \rfloor\). In Ch9 L3, Mobius inversion requires evaluating \(\sum_{d=1}^{n} \mu(d) \cdot F(\lfloor n/d \rfloor)\) for some function \(F\). Both reduce to iterating over the same staircase. Master the block decomposition once, and it becomes a reusable primitive for every multiplicative function problem involving large \(n\).
Block Boundaries
Once you know the staircase has \(O(\sqrt{n})\) steps, you need to identify where each step starts and ends.
Suppose \(\lfloor n/i \rfloor = v\). Which values of \(i\) give this same floor value? The floor equals \(v\) exactly when \(v \leq n/i < v + 1\), which means \(n/(v+1) < i \leq n/v\). So the block of indices with floor value \(v\) ends at \(i = \lfloor n/v \rfloor\).
The key formula: if \(\lfloor n/i \rfloor = v\), then the largest index in this block is \(\lfloor n/v \rfloor\).
Convince yourself with an example. For \(n = 20\) and \(i = 7\): \(v = \lfloor 20/7 \rfloor = 2\). The block end is \(\lfloor 20/2 \rfloor = 10\). Check: \(\lfloor 20/8 \rfloor = 2\), \(\lfloor 20/9 \rfloor = 2\), \(\lfloor 20/10 \rfloor = 2\), \(\lfloor 20/11 \rfloor = 1 \neq 2\). So the block is \([7, 10]\), exactly as predicted.
Starting from \(i = 1\), you compute \(v = \lfloor n/i \rfloor\), find the block end \(\lfloor n/v \rfloor\), process the entire block, then jump to \(\lfloor n/v \rfloor + 1\) and repeat.
Trace Table: \(n = 20\)
Walk through every block. Start at \(i = 1\).
| Block | \(i_{\text{start}}\) | \(v = \lfloor 20/i_{\text{start}} \rfloor\) | \(i_{\text{end}} = \lfloor 20/v \rfloor\) | Block length | Values of \(i\) |
|---|---|---|---|---|---|
| \(1\) | \(1\) | \(20\) | \(1\) | \(1\) | \(\{1\}\) |
| \(2\) | \(2\) | \(10\) | \(2\) | \(1\) | \(\{2\}\) |
| \(3\) | \(3\) | \(6\) | \(3\) | \(1\) | \(\{3\}\) |
| \(4\) | \(4\) | \(5\) | \(4\) | \(1\) | \(\{4\}\) |
| \(5\) | \(5\) | \(4\) | \(5\) | \(1\) | \(\{5\}\) |
| \(6\) | \(6\) | \(3\) | \(6\) | \(1\) | \(\{6\}\) |
| \(7\) | \(7\) | \(2\) | \(10\) | \(4\) | \(\{7,8,9,10\}\) |
| \(8\) | \(11\) | \(1\) | \(20\) | \(10\) | \(\{11,12,\ldots,20\}\) |
\(8\) blocks total. Compare that to iterating all \(20\) values. For \(n = 20\), \(2\sqrt{20} \approx 8.9\), so \(8\) blocks is right at the bound.
Look at how the blocks behave. The first six blocks each contain a single index — the floor value drops with every step. Then block \(7\) covers four indices at once, and block \(8\) covers ten. The staircase is steep at the start and flat at the end. This is the universal shape: small indices produce large, rapidly changing floor values; large indices produce small floor values that repeat for long stretches.
Verify the sum \(\sum_{i=1}^{20} \lfloor 20/i \rfloor\) using blocks:
Cross-check with the direct sum: \(20 + 10 + 6 + 5 + 4 + 3 + 2 + 2 + 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 66\). Matches.
The Block-Processing Algorithm
The iteration template works for any sum of the form \(\sum_{i=1}^{n} f(i) \cdot g(\lfloor n/i \rfloor)\), as long as you can compute the aggregate of \(f\) over a contiguous range quickly. The two applications below both fit this form:
- \(\sum \lfloor n/i \rfloor\) has \(f(i) = 1\), \(g(v) = v\). Aggregate of \(f\) over a range: just the range length.
- \(\sum i \cdot \lfloor n/i \rfloor\) has \(f(i) = i\), \(g(v) = v\). Aggregate of \(f\) over a range: arithmetic series sum.
Algorithm:
- Set \(i_{\text{start}} = 1\).
- While \(i_{\text{start}} \leq n\):
- Compute \(v = \lfloor n / i_{\text{start}} \rfloor\).
- Compute \(i_{\text{end}} = \lfloor n / v \rfloor\).
- Process all indices from \(i_{\text{start}}\) to \(i_{\text{end}}\) at once (they all share floor value \(v\)).
- Set \(i_{\text{start}} = i_{\text{end}} + 1\).
Each iteration processes one block and advances past it. Since there are \(O(\sqrt{n})\) blocks, the total time is \(O(\sqrt{n})\) times the cost of processing one block.
One subtlety: the algorithm never needs to know \(\sqrt{n}\) explicitly. It doesn't compute \(\sqrt{n}\) or branch on whether \(i\) is above or below it. The block iteration naturally handles both regions. For small \(i\), each block has length \(1\) (the floor value changes every step). For large \(i\), blocks become wide. The algorithm adapts automatically — no special-casing required.
Application 1: \(\sum_{d=1}^{n} \lfloor n/d \rfloor\) in \(O(\sqrt{n})\)
This sum counts the total number of divisor pairs \((d, q)\) with \(d \cdot q \leq n\) — equivalently, it equals \(\sum_{k=1}^{n} \tau(k)\) where \(\tau(k)\) is the number of divisors of \(k\). That connection comes from swapping summation order (covered in Ch9 L1).
In each block, every index \(d\) from \(i_{\text{start}}\) to \(i_{\text{end}}\) contributes the same value \(v\). The block's contribution is \(v \times (i_{\text{end}} - i_{\text{start}} + 1)\).
For \(n = 10^6\), the naive loop takes \(10^6\) iterations. The block approach takes about \(2000\). For \(n = 10^{12}\), the speedup is six orders of magnitude: \(2 \times 10^6\) blocks instead of \(10^{12}\) iterations.
#include <bits/stdc++.h>
using namespace std;
int main() {
long long upperBound;
cin >> upperBound;
long long totalSum = 0;
long long blockStart = 1;
while (blockStart <= upperBound) {
long long floorValue = upperBound / blockStart;
long long blockEnd = upperBound / floorValue;
totalSum += floorValue * (blockEnd - blockStart + 1);
blockStart = blockEnd + 1;
}
cout << totalSum << "\n";
return 0;
}
\(O(\sqrt{n})\) time, \(O(1)\) space.
Application 2: CSES Sum of Divisors
Problem. Compute \(\sigma(1) + \sigma(2) + \cdots + \sigma(n)\) modulo \(10^9 + 7\), where \(\sigma(k)\) is the sum of divisors of \(k\). Constraints: \(n \leq 10^{12}\).
Rewrite by swapping summation order. Each divisor \(d\) contributes \(d\) to every multiple \(d, 2d, 3d, \ldots\) up to \(n\). There are \(\lfloor n/d \rfloor\) such multiples. So:
Now apply block decomposition. In a block where \(\lfloor n/d \rfloor = v\) for all \(d \in [i_{\text{start}}, i_{\text{end}}]\), the contribution is:
The sum of consecutive integers from \(a\) to \(b\) is \((a + b)(b - a + 1) / 2\). Compute this per block, multiply by the shared floor value, accumulate modulo \(10^9 + 7\).
Division by \(2\) under a modulus requires the modular inverse. Since \(10^9 + 7\) is prime, the inverse of \(2\) is \(2^{10^9 + 5} \bmod (10^9 + 7) = 500000004\).
Trace for \(n = 20\). The answer should be \(\sigma(1) + \sigma(2) + \cdots + \sigma(20) = 1 + 3 + 4 + 7 + 6 + 12 + 8 + 15 + 13 + 18 + 12 + 28 + 14 + 24 + 24 + 31 + 18 + 39 + 20 + 42 = 339\).
Using blocks: for block \(7\) (\(d \in [7, 10]\), \(v = 2\)), the range sum is \(7 + 8 + 9 + 10 = 34\), and the contribution is \(2 \times 34 = 68\). For block \(8\) (\(d \in [11, 20]\), \(v = 1\)), the range sum is \(11 + 12 + \cdots + 20 = 155\), and the contribution is \(1 \times 155 = 155\). Sum all eight blocks: \(20 + 20 + 18 + 20 + 20 + 18 + 68 + 155 = 339\). Matches.
The Code
Block iteration template
#include <bits/stdc++.h>
using namespace std;
int main() {
long long upperBound;
cin >> upperBound;
long long totalSum = 0;
long long blockStart = 1;
while (blockStart <= upperBound) {
long long floorValue = upperBound / blockStart;
long long blockEnd = upperBound / floorValue;
totalSum += floorValue * (blockEnd - blockStart + 1);
blockStart = blockEnd + 1;
}
cout << totalSum << "\n";
return 0;
}
CSES Sum of Divisors
#include <bits/stdc++.h>
using namespace std;
int main() {
long long upperBound;
cin >> upperBound;
long long modulus = 1000000007;
long long inverseTwo = 500000004;
long long totalSum = 0;
long long blockStart = 1;
while (blockStart <= upperBound) {
long long floorValue = upperBound / blockStart;
long long blockEnd = upperBound / floorValue;
long long rangeEnd = blockEnd % modulus;
long long rangeStart = (blockStart - 1) % modulus;
long long hi = rangeEnd * ((rangeEnd + 1) % modulus) % modulus;
long long lo = rangeStart * ((rangeStart + 1) % modulus) % modulus;
long long rangeSum = (hi - lo + 2 * modulus) % modulus * inverseTwo % modulus;
totalSum = (totalSum + floorValue % modulus * rangeSum) % modulus;
blockStart = blockEnd + 1;
}
cout << totalSum << "\n";
return 0;
}
\(O(\sqrt{n})\) time, \(O(1)\) space. For \(n = 10^{12}\), this runs in roughly \(2 \times 10^6\) iterations.
The modular arithmetic in the range sum deserves a closer look. You're computing \(\sum_{d=a}^{b} d = b(b+1)/2 - (a-1)a/2\) modulo \(10^9 + 7\). Each of \(b\), \(b+1\), \(a-1\), and \(a\) can be up to \(10^{12}\), so you must reduce modulo \(10^9 + 7\) before multiplying any two terms. The subtraction can go negative — adding \(2 \times (10^9 + 7)\) before taking the final modulus ensures the result stays non-negative.
Complexity
\(O(\sqrt{n})\) time, \(O(1)\) space. The number of blocks is at most \(2\sqrt{n}\), and each block is processed in \(O(1)\). No precomputation, no auxiliary arrays — just a single while loop.
Common Mistakes
-
Off-by-one in block boundaries. The block end is \(\lfloor n / v \rfloor\), not \(\lfloor n / v \rfloor - 1\). If you subtract \(1\), you'll skip indices and get wrong answers. Trace \(n = 20\) by hand to build confidence in the formula.
-
Overflow in the range sum. When computing \((i_{\text{start}} + i_{\text{end}})(i_{\text{end}} - i_{\text{start}} + 1) / 2\) modulo \(10^9 + 7\), you must reduce each factor before multiplying. Multiplying two numbers near \(10^{12}\) overflows
long long. Take \(\bmod\) at every intermediate step. A common pattern: compute \(a \% m\), compute \(b \% m\), then multiply, then take \(\% m\) again. -
Forgetting to advance past the block. After processing a block, set \(i_{\text{start}} = i_{\text{end}} + 1\). If you accidentally set \(i_{\text{start}} = i_{\text{end}}\), you'll loop forever on the same block.
-
Using
intinstead oflong longfor \(n\) up to \(10^{12}\). The block boundaries and floor values all needlong long. A singleintvariable in the computation chain silently truncates and produces wrong answers.
Quick Recap
- \(\lfloor n/i \rfloor\) takes at most \(2\sqrt{n}\) distinct values. The proof splits into \(i \leq \sqrt{n}\) (at most \(\sqrt{n}\) values of \(i\)) and \(i > \sqrt{n}\) (floor value \(< \sqrt{n}\)).
- Each contiguous block of indices sharing the same floor value \(v\) ends at \(i = \lfloor n/v \rfloor\).
- Block iteration processes each step of the staircase in \(O(1)\), giving \(O(\sqrt{n})\) total.
- \(\sum_{d=1}^{n} \lfloor n/d \rfloor\) computes in \(O(\sqrt{n})\) by summing \(v \times \text{block length}\) per block.
- CSES Sum of Divisors reduces to \(\sum d \cdot \lfloor n/d \rfloor\), where the \(d\)-sum per block is an arithmetic series.
- The block iteration pattern generalizes: any \(\sum f(i) \cdot g(\lfloor n/i \rfloor)\) becomes \(O(\sqrt{n})\) if \(\sum f\) over a range is \(O(1)\).
- Modular arithmetic in range sums requires reducing every factor before multiplication and handling negative remainders.
Problems
| Problem | Link | Difficulty | Focus |
|---|---|---|---|
| CSES — Sum of Divisors | cses.fi/problemset/task/1082 | Hard | Block decomposition with arithmetic series; modular arithmetic with inverse of \(2\) |
| CSES — Divisor Analysis (sum query) | cses.fi/problemset/task/2182 | Hard | Applying divisor-sum formulas from prime factorization; tests multiplicative function reasoning alongside floor staircase intuition |
| CF 1485C — Floor and Mod | codeforces.com/problemset/problem/1485/C | Medium | Enumerate \(\lfloor a/b \rfloor\) values; count valid \((a, b)\) pairs per block |