Brainstorming Tasks: Binary Search and Variations
1. Is it necessary for the array to be sorted for binary search?
Yes, absolutely. This is the single most important requirement for binary search.
To understand why, think of binary search like looking for a name in a physical phone book.
-
Sorted (Phone Book): If you want to find "Smith", you can open the book to the middle (say, the "M" section). You instantly know "Smith" must be in the second half of the book (N-Z) because 'S' comes after 'M'. You have just "thrown away" half the book (A-M) in a single step. This is only possible because the names are sorted.
-
Unsorted (Random Pile of Papers): If you had the same list of names, but they were all shuffled randomly in a big pile, opening to a random page (finding "Miller") tells you nothing about where "Smith" is. It could be anywhere before or after "Miller". You have no choice but to check every single page one by one (a linear scan).
This "sorted" property is called monotonicity. It means the values are consistently increasing (or decreasing). This guarantee is what gives binary search its power: it allows you to confidently discard half of the search space at every step, which is why it's so fast (\(O(\log n)\)).
2. Can we apply binary search on a non-sorted array?
This depends entirely on the array's hidden structure or how you define "binary search."
-
On a truly random array: No. For the reasons above, it's impossible. Using a standard binary search, you can find zero elements with 100% accuracy.
-
On a rotated sorted array (e.g.,
[4, 5, 6, 7, 1, 2, 3]): Yes, with a modification. This is a classic interview problem. You don't search for the value directly. Instead, you use yourmidpoint to figure out which half of the array is "normal" (still sorted). Based on that, you can safely decide which half must contain your target, and you continue the search in that half. -
On an unsorted array, redefining "Binary Searchable": If the problem is "which elements are guaranteed to be found regardless of pivot choice?", then yes.
An element
Xcan fail to be found if: 1. A pivot to its left is larger thanX, causing the search to discard the right side. 2. A pivot to its right is smaller thanX, causing the search to discard the left side.Therefore, for an element
A[i]to be "binary searchable," it must be immune to these failures. The Golden Rule: An elementA[i]is searchable only if it is greater than every element to its left AND smaller than every element to its right.This means
A[i]must be the maximum of its prefix and the minimum of its suffix.\(O(n)\) Solution: 1. Create a
prefix_maxarray in one pass (left-to-right). 2. Create asuffix_minarray in a second pass (right-to-left). 3. In a third pass, count every elementA[i]whereA[i] == prefix_max[i]ANDA[i] == suffix_min[i].
3. Can you prove that any 2-element case [x, x+1]... behaves identically?
Yes. This is a mathematical property of integer division, not the data itself. The infinite loop happens when mid is calculated in a way that doesn't shrink the search space.
Let's prove it for the general 2-element case where r = l + 1 (e.g., l=4, r=5 or l=6, r=7).
-
The State: The loop
while(l < r)is running. We havel = kandr = k + 1. The conditionk < k + 1is true. -
The "Floor" Midpoint: We use the most common
midcalculation:mid = (l + r) / 2mid = (k + (k + 1)) / 2mid = (2k + 1) / 2mid = k + 0.5 -
The "Gotcha" (Integer Division): In C++, Java, Python, etc., integer division truncates (cuts off) the decimal. So,
k + 0.5becomes justk. This meansmid = l. -
The Infinite Loop:
- You are in the 2-element range
[k, k+1]. - You calculate
mid = l. - If your update logic for the "false" case (go right) is
l = mid, you just executedl = l. - Your new range is... still
[k, k+1]. The search space did not shrink. - The loop condition
while(l < r)is still true, and you repeat the process forever.
- You are in the 2-element range
This behavior is identical for any adjacent indices [k, k+1] because the math (k + (k+1)) / 2 always truncates to k.
4. What if my predicate is non-monotonic, like 000...100...111...?
A standard binary search will fail. It requires a perfectly monotonic (000...111) pattern. It might "find" the first 1 at the glitch, but then incorrectly discard the real answer (the final block of 1s) when it searches to the left and finds 0s again.
However, you can fix this by "patching" the predicate.
- The Problem: The
1in the...100...section is a "glitch" or a "false positive" because it's followed by a0. - The Solution: Create a new, "clean" predicate function,
P_clean(mid), that handles this.- If
midis the last element, just returnA[mid] == 1. - Check for the glitch: If
A[mid] == 1ANDA[mid+1] == 0, returnfalse. (This treats the "glitch"1as a0). - Otherwise, just return the real value:
A[mid] == 1.
- If
- The Result: This wrapper function transforms your "dirty" predicate
[0, 0, 1, 0, 0, 1, 1, 1]into a perfectly monotonic one:[0, 0, 0, 0, 0, 1, 1, 1]. - Now, you can run a standard binary search (like
lower_bound) onP_clean(mid), and it will correctly find the true first1.
5. Is it necessary to use ternary search for a unimodal graph?
No! You can use a modified binary search that is often simpler and faster.
A unimodal graph is one with a single peak (like a hill: [1, 3, 5, 4, 2]).
- Ternary Search finds the peak by comparing two midpoints (
f(m1)vs.f(m2)). -
Modified Binary Search can find it by checking the slope at a single midpoint.
-
We are looking for the peak, which is the only index
midwhereA[mid] > A[mid + 1]. - Calculate
mid. - Compare
A[mid]with its right-hand neighbor,A[mid + 1].- Case 1:
A[mid] < A[mid + 1]We are on the left side of the hill (slope is positive, going up). The peak must be to our right. Update:l = mid + 1. - Case 2:
A[mid] > A[mid + 1]We are on the right side of the hill (slope is negative, going down). The peak is eithermiditself or to its left. Update:r = mid.
- Case 1:
- The loop
while(l < r)will finish withl(orr) pointing at the index of the peak. This works in \(O(\log n)\).
6. Can you find some use case for nested ternary search and binary search?
Yes, this is common when you need to optimize or search across two (or more) dimensions.
-
Nested Binary Search:
- Problem: Find the \(K\)-th smallest number in a (virtual) \(N \times M\) multiplication table (where
table[i][j] = i * j). - Outer Search: Binary search for the answer
X(in the value range[1, N*M]). - Inner Search (Predicate): To check if
Xis a valid answer, we must count how many table elements are<= X. We can do this by iterating through each rowiand using a second binary search (or a simple calculationmin(M, X / i)) to find how many elements in that row are<= X. We sum these counts and compare toK.
- Problem: Find the \(K\)-th smallest number in a (virtual) \(N \times M\) multiplication table (where
-
Nested Ternary Search:
- Problem: Find the brightest point
(x, y)on a 3D hillf(x, y)that is unimodal in both directions. - Outer Search: Ternary search for the optimal
xcoordinate. - Inner Search (Predicate): To compare two
xvalues (m1xandm2x), we must find the best possible value at each. To do this, we run an inner ternary search onyto find the peak of the 1D functiong(y) = f(m1x, y). The outer search compares the peaks found by the inner searches.
- Problem: Find the brightest point
7. Find the missing element using binary search
Yes, this is a classic problem.
- Problem: You have a sorted array
Athat should contain[1, 2, 3, 4, 5, 6]but is missing one:A = [1, 2, 3, 5, 6]. -
The Logic: In a "perfect" array, the element at
A[i](0-based index) should be equal toi + 1. The missing number creates a "break" in this pattern.A[0] == 0 + 1?1 == 1. True.A[1] == 1 + 1?2 == 2. True.A[2] == 2 + 1?3 == 3. True.A[3] == 3 + 1?5 == 4. False.A[4] == 4 + 1?6 == 5. False.
-
The Solution: This creates a monotonic predicate
P(i) = (A[i] == i + 1), which looks like[True, True, True, False, False]. We just need to find the firstFalse.- Binary search on the indices
i. - If
A[mid] == mid + 1(predicate is "true"), the break must be to the right:l = mid + 1. - If
A[mid] != mid + 1(predicate is "false"), the break is at or beforemid:r = mid. - The loop
while(l < r)will terminate withl(orr) pointing to the first index that is "false".
- Binary search on the indices
- The Answer: The loop ends at
l = 3. The missing number is the one that should have been at this index:l + 1. The missing number is3 + 1 = 4.
8. Hidden Information Processing
Sometimes, hidden properties of the data allow for powerful modifications to binary search.
Q1: Find a number in a sorted sequence of adjacent multiples of x (with repeats). How to make binary search faster?
- Problem: Array
A = [10, 10, 15, 15, 15, 20, 25, 25].x=5. Target20. - Hidden Info: The values in the array are sorted and form an arithmetic progression. A standard binary search
mid = (l+r)/2ignores this; it assumes the indices are uniformly distributed, not the values. - The Solution: Interpolation Search This is a "smarter" binary search. Instead of always checking the middle index, it makes an educated guess about where the target should be based on its value.
- How it works:
- If
A[l] = 10,A[r] = 25, andtarget = 20, you guess that20is not in the middle of the index range, but about 2/3 of the way through the value range. - The new
midis calculated proportionally:mid = l + ( (target - A[l]) / (A[r] - A[l]) ) * (r - l)
- If
- Speed:
- Best Case: If the values are uniformly distributed (e.g.,
[2, 4, 6, 8, 10]), this search is incredibly fast: \(O(\log \log n)\). - Worst Case: If the values are not uniform (e.g.,
[1, 2, 3, 4, 1000]), the guess will be bad, and the search degrades to \(O(n)\).
- Best Case: If the values are uniformly distributed (e.g.,
Q2: How can you find the median or k-th smallest element of two sorted arrays using binary search?
- Problem: Find the 5th smallest element (median) from
A = [1, 5, 9]andB = [2, 3, 6, 7]. (The merged array is[1, 2, 3, 5, 6, 7, 9], so the answer is6). - Hidden Info: You cannot binary search the values. But you can binary search for the partition point.
- The Logic:
- We need to find a total of
kelements. - Let's try to take
part_Aelements fromAandpart_Belements fromB, such thatpart_A + part_B = k. - If we find the perfect partition, then every element in the "left half" (the
kelements) is smaller than every element in the "right half". - We only need to check the four boundary elements:
l_A(max of A's left)r_A(min of A's right)l_B(max of B's left)r_B(min of B's right)
- The partition is perfect if
l_A <= r_BANDl_B <= r_A.
- We need to find a total of
- The Algorithm:
- We binary search for the correct
part_A(from0tom, the size ofA). part_Bis automatically set tok - part_A.- If
l_A > r_B:part_Ais too big. We must take fewer elements fromA. Move the search left:r = part_A - 1. - If
l_B > r_A:part_Ais too small. We must take more elements fromA. Move the search right:l = part_A + 1. - If Perfect: The answer is
max(l_A, l_B).
- We binary search for the correct
- Speed: This search runs in \(O(\log(\min(m, n)))\) time, which is extremely fast.