Algorithms
Binary Search: Halving the Search Space
Цели урока
- Implement binary search without errors
- Understand invariants and avoid off-by-one
- Use lower_bound/upper_bound
- Apply binary search the answer pattern
Предварительные знания
- Linear search
- Sorted arrays
In 1946, John Mauchly described binary search. The first correct implementation only appeared in 1962. 90% of programmers write binary search with bugs on the first try. Let's be in the 10%.
- git bisect - finding the commit with a bug
- Databases - B-tree indexes
- Network protocols - TCP congestion control
- Games - collision detection (spatial partitioning)
History of Binary Search
Binary search was described in 1946, but the first correct implementation for all array sizes only appeared in 1962 (Lehmer's paper). The overflow bug in `(left + right) / 2` was discovered in Java only in 2006 - nine years after the JDK was released!
Idea: Split in Half
**Binary search** works on sorted data. At each step we discard half.
| n | Linear (avg) | Binary (worst) |
|---|---|---|
| 100 | 50 | 7 |
| 1,000 | 500 | 10 |
| 1,000,000 | 500,000 | 20 |
| 1,000,000,000 | 500,000,000 | 30 |
**log₂(10⁹) ≈ 30**. Among a billion elements we find the target in 30 comparisons!
What is the maximum number of comparisons needed for binary search in an array of 1024 elements?
Implementation: Watch the Boundaries!
**Classic implementation:**
**Why `left + (right - left) // 2`?** When left and right are large, their sum may overflow (in C/C++, Java). Not a problem in Python, but a good habit.
**Recursive version:**
**Iterative version is preferred:** O(1) memory instead of O(log n) for the recursion stack.
What happens if you write `while left < right` instead of `while left <= right`?
Off-by-one: Classic Mistakes
**Binary search is one of the most error-prone algorithms.** The first correct implementation was published only 16 years after its invention!
**Loop invariant** - a way to prove correctness:
Why can `left = mid` (without +1) cause an infinite loop?
Variations: lower_bound, upper_bound
**Often you need not the element itself, but a boundary:**
- **lower_bound(x)** - first element ≥ x
- **upper_bound(x)** - first element > x
- **bisect_left** and **bisect_right** in Python
**C++ STL:** `std::lower_bound`, `std::upper_bound` work identically.
arr = [1, 2, 2, 2, 3]. How many twos? (using bisect)
Applications of Binary Search
**Binary search applies not only to arrays!** Anywhere there is monotonicity - binary search can be used.
**Pattern 'Binary Search the Answer':**
**LeetCode:** ~100 problems are solved with binary search. The key phrase is 'minimum/maximum that satisfies condition'.
Task: find the minimum ship capacity to deliver all packages in D days. This is:
Binary Search
- O(log n) for sorted data
- Watch the boundaries: <=, mid±1, overflow
- lower_bound/upper_bound for range queries
- Binary search the answer - a broadly applicable pattern
- bisect module in Python, STL in C++
Related Topics
Binary search is the foundation of many algorithms and data structures.
- Interpolation Search — O(log log n) for uniform data
- BST — Binary search in a tree
- Divide & Conquer — The halving paradigm
Связанные уроки
- alg-09-linear-search — Binary search is an improvement over linear search for sorted data: O(log n) instead of O(n) by discarding half the range each step
- alg-11-interpolation — Interpolation search is the next step: it uses values to guess the position, O(log log n) for uniform data
- ds-11-bst — A BST is binary search embodied as a data structure: the tree invariant equals the binary search invariant
- alg-19-divide-conquer — Binary search is the simplest example of Divide & Conquer: we cut the search space in half at every step
- alg-05-merge-sort — Effective binary search needs sorted data - Merge Sort is a reliable way to get it
- aie-10-vector-databases