Extended Euclidean and Bezout's Identity
Pillar: Fix — "Fix \(\gcd(a,b)\). The equation \(ax + by = \gcd(a,b)\) has solutions — Extended Euclidean finds them."
Pillar: Shape — "The algorithm traces a staircase of \((a,b)\) pairs, shrinking at each step. The coefficients climb back up as we unwind."
The Setup
You know from Ch3 L1 that \(\gcd(252, 105) = 21\). The Euclidean algorithm told you that. But here's a question that algorithm doesn't answer: can you write \(21\) as a combination of \(252\) and \(105\)? That is, can you find integers \(x\) and \(y\) such that
Try to guess. It's not obvious that such \(x, y\) even exist — after all, \(252\) and \(105\) are both much bigger than \(21\). You'd need \(x\) or \(y\) to be negative. And there's no obvious way to find them by staring at the numbers.
The answer turns out to be \(x = -2, y = 5\), because \(252 \times (-2) + 105 \times 5 = -504 + 525 = 21\). That's one solution. There are infinitely many others.
The Extended Euclidean Algorithm finds these coefficients mechanically, as a byproduct of computing the GCD itself. It's the same recursion you already know, with bookkeeping bolted on.
Why care? Because these coefficients unlock modular inverses (Ch4), solve linear Diophantine equations (next lesson), and crack the Chinese Remainder Theorem (Ch5). Every time you need to "undo" a multiplication modulo some number, the Extended Euclidean is the tool that does it.
Bezout's Identity
The key theorem is this:
Bezout's Identity. For any integers \(a\) and \(b\) (not both zero), there exist integers \(x\) and \(y\) such that \(ax + by = \gcd(a, b)\).
This is NOT obvious. The claim is that you can always express \(\gcd(a,b)\) as a linear combination of \(a\) and \(b\) with integer coefficients. No matter how large \(a\) and \(b\) are, no matter how small their GCD is, the right \(x\) and \(y\) exist.
The converse direction is easier to see. Any number of the form \(ax + by\) must be divisible by \(\gcd(a,b)\), because \(\gcd(a,b) \mid a\) and \(\gcd(a,b) \mid b\), so \(\gcd(a,b) \mid (ax + by)\). That means the smallest positive value of \(ax + by\) is at least \(\gcd(a,b)\). Bezout's Identity says it's exactly \(\gcd(a,b)\) — you can achieve the lower bound.
The proof is the algorithm itself. The Extended Euclidean Algorithm constructs explicit \(x, y\) satisfying the equation. No hand-waving needed — the code IS the proof.
The Extended Euclidean Algorithm
Review: Standard Euclidean
The standard Euclidean algorithm computes \(\gcd(a, b)\) by repeated division:
until \(b = 0\), at which point \(\gcd(a, 0) = a\).
Extending It
The Extended version runs the same recursion but carries back two coefficients \(x, y\) at each step, so that the identity \(ax + by = \gcd(a, b)\) holds at every level.
Base case. When \(b = 0\): \(\gcd(a, 0) = a\). We need \(a \cdot x + 0 \cdot y = a\). Set \(x = 1, y = 0\). Done.
Recursive step. Suppose the recursive call to \(\gcd(b, \; a \bmod b)\) has already returned coefficients \(x'\) and \(y'\) satisfying:
Now substitute \(a \bmod b = a - \lfloor a/b \rfloor \cdot b\):
Distribute and regroup by \(a\) and \(b\):
Reading off the coefficients of \(a\) and \(b\):
That's the entire algorithm. Same recursion as before, one extra line of arithmetic on the way back up.
Full Trace: Extended GCD of \((252, 105)\)
This is the centerpiece. Follow every step — first going down (computing the GCD), then climbing back up (recovering the coefficients).
Forward Pass: Computing the GCD
| Step | \(a\) | \(b\) | \(q = \lfloor a/b \rfloor\) | \(r = a \bmod b\) | Next call |
|---|---|---|---|---|---|
| 1 | \(252\) | \(105\) | \(2\) | \(42\) | \(\gcd(105, 42)\) |
| 2 | \(105\) | \(42\) | \(2\) | \(21\) | \(\gcd(42, 21)\) |
| 3 | \(42\) | \(21\) | \(2\) | \(0\) | \(\gcd(21, 0)\) |
| 4 | \(21\) | \(0\) | — | — | Base case: return \(21\) |
The GCD is \(21\). Standard stuff. Now the interesting part.
Backward Pass: Unwinding the Coefficients
We climb back up from the base case, computing \(x\) and \(y\) at each step.
Step 4 (base case). \(\gcd(21, 0) = 21\). Set \(x = 1, y = 0\).
Check: \(21 \cdot 1 + 0 \cdot 0 = 21\). Correct.
Step 3. We returned from \(\gcd(42, 21)\) with \(x' = 1, y' = 0\) (from step 4, but note: the recursive call was \(\gcd(21, 0)\), which returned \(x' = 1, y' = 0\) for the equation \(21 \cdot x' + 0 \cdot y' = 21\)).
Wait — let's be precise. At step 3, we called \(\gcd(b, a \bmod b) = \gcd(21, 0)\), which returned \(x' = 1, y' = 0\).
Now apply the update formulas with \(a = 42, b = 21, q = 2\):
Check: \(42 \cdot 0 + 21 \cdot 1 = 21\). Correct.
Step 2. The recursive call \(\gcd(42, 21)\) returned \(x' = 0, y' = 1\).
Apply the update with \(a = 105, b = 42, q = 2\):
Check: \(105 \cdot 1 + 42 \cdot (-2) = 105 - 84 = 21\). Correct.
Step 1. The recursive call \(\gcd(105, 42)\) returned \(x' = 1, y' = -2\).
Apply the update with \(a = 252, b = 105, q = 2\):
Check: \(252 \cdot (-2) + 105 \cdot 5 = -504 + 525 = 21\). Correct.
Summary Table
| Step | \(a\) | \(b\) | \(q\) | \(r\) | \(x\) | \(y\) | Verification: \(ax + by\) |
|---|---|---|---|---|---|---|---|
| 4 (base) | \(21\) | \(0\) | — | — | \(1\) | \(0\) | \(21 \cdot 1 + 0 \cdot 0 = 21\) |
| 3 | \(42\) | \(21\) | \(2\) | \(0\) | \(0\) | \(1\) | \(42 \cdot 0 + 21 \cdot 1 = 21\) |
| 2 | \(105\) | \(42\) | \(2\) | \(21\) | \(1\) | \(-2\) | \(105 \cdot 1 + 42 \cdot (-2) = 21\) |
| 1 | \(252\) | \(105\) | \(2\) | \(42\) | \(-2\) | \(5\) | \(252 \cdot (-2) + 105 \cdot 5 = 21\) |
Every row checks out. The coefficients get built from the bottom up, one substitution at a time. The final answer: \(x = -2, y = 5\).
Geometric View: The Staircase
Picture a \(252 \times 105\) rectangle. You want to tile it with the largest possible square. That square has side length \(\gcd(252, 105) = 21\).
The Euclidean algorithm traces a staircase through this rectangle:
- Fit \(\lfloor 252 / 105 \rfloor = 2\) squares of size \(105 \times 105\). Remainder: a \(42 \times 105\) strip.
- Rotate. Fit \(\lfloor 105 / 42 \rfloor = 2\) squares of size \(42 \times 42\). Remainder: a \(21 \times 42\) strip.
- Rotate. Fit \(\lfloor 42 / 21 \rfloor = 2\) squares of size \(21 \times 21\). Remainder: \(0\). Done.
Each step subtracts \(q\) copies of the smaller dimension from the larger one. The path zigzags — horizontal cut, vertical cut, horizontal, vertical — like a staircase descending to the corner.
The Extended Euclidean unwinds this staircase. At the bottom, the \(21 \times 21\) tile has trivial coordinates: \((1, 0)\). Each step back up translates those coordinates into the coordinate system of the previous, larger rectangle. By the time you reach the top, you have the full representation \(252 \cdot (-2) + 105 \cdot 5 = 21\) — the coefficients encode the path through the staircase.
The Water Jug Problem
You have two jugs: one holds \(a\) liters, the other holds \(b\) liters. You can fill either jug completely, empty either jug completely, or pour from one into the other until the source is empty or the destination is full. Can you measure exactly \(c\) liters?
Answer: Yes if and only if \(\gcd(a, b) \mid c\) (and \(c \leq \max(a, b)\), assuming you're measuring in one jug).
Why? Every state you can reach is of the form \(ax + by\) for some integers \(x, y\) (where \(x\) counts how many times you filled/emptied jug \(a\), and \(y\) counts for jug \(b\) — signs handle direction). By Bezout's Identity, the set of achievable values \(\{ax + by : x, y \in \mathbb{Z}\}\) is exactly the set of multiples of \(\gcd(a, b)\).
Concrete example. Jugs of capacity \(5\) and \(3\). \(\gcd(5, 3) = 1\), so you can measure any integer from \(0\) to \(5\).
Want to measure \(4\) liters? Extended Euclidean gives \(5 \cdot 2 + 3 \cdot (-3) = 1\), so \(5 \cdot 8 + 3 \cdot (-12) = 4\). The signs tell you the sequence of pours: filling the \(5\)-jug (positive coefficient) and emptying the \(3\)-jug (negative coefficient). The actual pour sequence requires translating the coefficients into fill/pour/empty operations, but the existence proof comes directly from Bezout.
Want to measure \(2\) liters with jugs of capacity \(6\) and \(4\)? \(\gcd(6, 4) = 2\) and \(2 \mid 2\), so yes. Extended Euclidean: \(6 \cdot 1 + 4 \cdot (-1) = 2\). Fill the \(6\)-jug, pour into the \(4\)-jug, and \(2\) liters remain.
Want \(3\) liters with jugs of \(6\) and \(4\)? \(\gcd(6, 4) = 2\) and \(2 \nmid 3\). Impossible. No sequence of pours will ever produce \(3\) liters. The proof is the divisibility argument: every reachable quantity is a multiple of \(2\).
The Full Family of Solutions
The Extended Euclidean gives you ONE solution \((x_0, y_0)\). But the equation \(ax + by = \gcd(a, b)\) has infinitely many solutions. All of them.
Let \(g = \gcd(a, b)\). If \((x_0, y_0)\) is one solution, then every solution has the form:
Why this works. Substitute into \(ax + by\):
The added terms cancel perfectly. Adding \(b/g\) to \(x\) and subtracting \(a/g\) from \(y\) keeps the total unchanged — the two adjustments are equal and opposite.
Why these are ALL the solutions. Suppose \((x_1, y_1)\) is any other solution: \(ax_1 + by_1 = g\). Subtract the original equation: \(a(x_1 - x_0) + b(y_1 - y_0) = 0\), which gives \(a(x_1 - x_0) = -b(y_1 - y_0)\). Dividing both sides by \(g\): \((a/g)(x_1 - x_0) = -(b/g)(y_1 - y_0)\). Since \(\gcd(a/g, b/g) = 1\), it follows that \((b/g) \mid (x_1 - x_0)\). So \(x_1 - x_0 = (b/g) \cdot t\) for some integer \(t\), and \(y_1 - y_0 = -(a/g) \cdot t\).
Example. For \(252x + 105y = 21\) with \((x_0, y_0) = (-2, 5)\) and \(g = 21\):
| \(t\) | \(x\) | \(y\) | Check: \(252x + 105y\) |
|---|---|---|---|
| \(-1\) | \(-7\) | \(17\) | \(252(-7) + 105(17) = -1764 + 1785 = 21\) |
| \(0\) | \(-2\) | \(5\) | \(252(-2) + 105(5) = -504 + 525 = 21\) |
| \(1\) | \(3\) | \(-7\) | \(252(3) + 105(-7) = 756 - 735 = 21\) |
| \(2\) | \(8\) | \(-19\) | \(252(8) + 105(-19) = 2016 - 1995 = 21\) |
Every row works. The solutions form an arithmetic progression in both \(x\) and \(y\), parameterized by \(t\).
Negative Solutions and Adjusting \(t\)
The Extended Euclidean often returns a solution where one coefficient is negative. That's perfectly fine mathematically — but sometimes a problem demands \(x > 0\) or \(y > 0\) (or both, if the equation is \(ax + by = c\) and \(a, b, c > 0\) — you might want the smallest positive solution).
From the family \(x = x_0 + (b/g) \cdot t\), you can solve for the range of \(t\) that makes \(x\) positive:
Similarly for \(y > 0\):
If you need BOTH \(x > 0\) and \(y > 0\), you need:
If no integer \(t\) falls in this range, there's no solution with both coefficients positive. This happens — for instance, \(252x + 105y = 21\) with both \(x, y > 0\) is impossible, because \(252 + 105 = 357 > 21\). You can't add two positive multiples of \(252\) and \(105\) and get something as small as \(21\).
The practical pattern. When a problem asks for the smallest non-negative \(x\), compute \(x_0 \bmod (b/g)\). That gives the unique \(x\) in the range \([0, b/g)\), and from that you recover \(y\).
The Code
Recursive Implementation
long long extendedGCD(long long first, long long second, long long &coeffFirst, long long &coeffSecond) {
if (second == 0) {
coeffFirst = 1;
coeffSecond = 0;
return first;
}
long long prevCoeffFirst, prevCoeffSecond;
long long divisor = extendedGCD(second, first % second, prevCoeffFirst, prevCoeffSecond);
coeffFirst = prevCoeffSecond;
coeffSecond = prevCoeffFirst - (first / second) * prevCoeffSecond;
return divisor;
}
\(O(\log(\min(a, b)))\) time, \(O(\log(\min(a, b)))\) space (recursion stack).
The function returns the GCD and sets coeffFirst, coeffSecond through reference parameters. After the call, the invariant first * coeffFirst + second * coeffSecond == divisor holds.
Trace through the code with \((252, 105)\):
- Call
extendedGCD(252, 105, ...)— recurse with \((105, 42)\). - Call
extendedGCD(105, 42, ...)— recurse with \((42, 21)\). - Call
extendedGCD(42, 21, ...)— recurse with \((21, 0)\). - Call
extendedGCD(21, 0, ...)— base case, return \((21, x=1, y=0)\). - Unwind step 3:
coeffFirst = 0,coeffSecond = 1 - 2*0 = 1. Returns \((21, 0, 1)\). - Unwind step 2:
coeffFirst = 1,coeffSecond = 0 - 2*1 = -2. Returns \((21, 1, -2)\). - Unwind step 1:
coeffFirst = -2,coeffSecond = 1 - 2*(-2) = 5. Returns \((21, -2, 5)\).
Final: \(252 \cdot (-2) + 105 \cdot 5 = 21\).
Iterative Implementation
For those who prefer no recursion — same algorithm, explicit loop:
long long extendedGCDIterative(long long first, long long second, long long &coeffFirst, long long &coeffSecond) {
long long oldR = first, currentR = second;
long long oldX = 1, currentX = 0;
long long oldY = 0, currentY = 1;
while (currentR != 0) {
long long quotient = oldR / currentR;
long long tempR = currentR;
currentR = oldR - quotient * currentR;
oldR = tempR;
long long tempX = currentX;
currentX = oldX - quotient * currentX;
oldX = tempX;
long long tempY = currentY;
currentY = oldY - quotient * currentY;
oldY = tempY;
}
coeffFirst = oldX;
coeffSecond = oldY;
return oldR;
}
\(O(\log(\min(a, b)))\) time, \(O(1)\) space.
The iterative version maintains two rows of the "extended" table at all times: the "old" row and the "current" row. Each iteration applies the quotient to update all three quantities (\(r\), \(x\), \(y\)) simultaneously. When currentR hits zero, oldR is the GCD and oldX, oldY are the coefficients.
Common Mistakes
-
Swapping the coefficient update. The correct formulas are \(x = y'\) and \(y = x' - q \cdot y'\). Writing \(x = x'\) and \(y = y' - q \cdot x'\) is wrong — the prime variables come from the RECURSIVE call (which swapped \(a\) and \(b\)), so \(x'\) corresponds to the coefficient of \(b\), not \(a\). If your code returns the wrong sign or wrong values, this swap is almost always the bug.
-
Forgetting that \(x, y\) can be negative. Bezout coefficients are NOT guaranteed positive. For \(252x + 105y = 21\), the algorithm returns \(x = -2\). If your code stores the result in an unsigned type, you get garbage. Always use signed types (
long long, notunsigned long long). -
Off-by-one when adjusting \(t\) for positive solutions. When you need the smallest non-negative \(x\), the formula is \(x_0 \bmod (b/g)\). But C++
%can return negative values when \(x_0 < 0\). Use((x0 % step) + step) % stepto guarantee a non-negative result. Getting this wrong is one of the most common sources of WA on Diophantine problems.
Quick Recap
- Bezout's Identity: For any \(a, b\) not both zero, integers \(x, y\) exist with \(ax + by = \gcd(a, b)\).
- Extended Euclidean = standard Euclidean + coefficient bookkeeping on the way back up. Same \(O(\log(\min(a,b)))\) complexity.
- Coefficient update: \(x = y'\), \(y = x' - \lfloor a/b \rfloor \cdot y'\). This is the only formula you need to memorize.
- All solutions: from one solution \((x_0, y_0)\), the full family is \(x = x_0 + (b/g)t\), \(y = y_0 - (a/g)t\).
- Water jug test: \(c\) liters is measurable with jugs of \(a\) and \(b\) if and only if \(\gcd(a,b) \mid c\).
Problems
| Problem | Link | Difficulty | Focus |
|---|---|---|---|
| CF 1244C — The Football Season | codeforces.com/problemset/problem/1244/C | 1600 | Finding non-negative solutions to a linear equation via Extended GCD |
| LC 365 — Water and Jug Problem | leetcode.com/problems/water-and-jug-problem | Medium | Direct application of Bezout — check if \(\gcd(a,b) \mid c\) |
Next lesson: Linear Diophantine Equations — generalizing from \(ax + by = \gcd(a,b)\) to \(ax + by = c\), counting solutions in a range, and handling the case where \(c\) is not a multiple of the GCD.