Algorithms

Interpolation Search: Smarter Than Binary

Цели урока

  • Understand the idea of interpolation in search
  • Implement interpolation search
  • Know when it is better/worse than binary search

Предварительные знания

  • Binary search
  • Binary Search

When you look up a word in a dictionary, you don't open it in the middle. You estimate roughly where the word should be. Interpolation search does the same thing - and sometimes beats binary search by 6x!

  • Search by timestamp in time-series databases
  • Indexes with numeric keys
  • Scientific data (measurements at equal intervals)

William Peterson and interpolation search at IBM

Interpolation search was described by William Wesley Peterson in 1957, in his paper Addressing for Random-Access Storage in the IBM Journal of Research and Development. Peterson worked on storing data on disks and drums, where reaching an arbitrary record was expensive, and he wanted a way to guess a key's location more accurately than plain halving. The idea itself came from ledgers and tables: when values are spread evenly, the key itself indicates roughly where to look, the way a word is found in a dictionary. Peterson showed that for uniformly distributed data this kind of search needs on the order of log log n probes on average instead of binary search's log n. Theorists later sharpened the analysis: Yao and Yao proved in 1976 that O(log log n) is the asymptotically optimal bound for searching uniform data. The method comes with a hard caveat, though: on skewed or adversarial distributions interpolation search degrades to O(n), worse than dependable binary search. That is why it rarely shows up in production libraries, yet its core idea came back directly in the learned indexes of 2018, where a trained model predicts a key's position.

Idea: Guess the Position

**Consider a phone book:** searching for 'Patterson' means opening it about 2/3 of the way in, not at the middle. Why? Because 'P' is closer to the end of the alphabet.

**Interpolation search** does the same thing - it guesses the position based on the value:

**Formula:** assumes a linear distribution and calculates where target should be.

For arr = [100, 200, 300, 400, 500], target = 350. Interpolation search starts at:

Implementation

**Important checks:** `arr[left] <= target <= arr[right]` and division by zero when `arr[left] == arr[right]`.

Why is the check `arr[left] <= target <= arr[right]` needed?

Complexity Analysis

CaseInterpolationBinary
Best (uniform)O(1)O(1)
Average (uniform)O(log log n)O(log n)
WorstO(n)O(log n)

**Why O(log log n)?**

**Worst case O(n):** if data is distributed non-uniformly (e.g., [1, 2, 3, 1000000]), interpolation makes poor guesses, and each step only advances by 1.

On what data does interpolation search degrade to O(n)?

When to Use

**Use interpolation search when:**

  • Data is **uniformly distributed** (or close to it)
  • Distribution is **known with confidence** (no outliers)
  • n is very large (the difference between log log n and log n matters)
  • Comparison is cheap, but arithmetic is not a problem

**Use binary search when:**

  • Distribution is **unknown or non-uniform**
  • There are **outliers**
  • A **guaranteed O(log n) complexity** is required
  • Code should be **simple and reliable**

**In practice**, binary search is used in 99% of cases. Interpolation search is exotic - for specific tasks with large, uniform data.

For searching logs by timestamp (which grow uniformly), better to use:

Interpolation Search

  • Guesses position based on value
  • O(log log n) for uniform data
  • O(n) for non-uniform data - worse than binary!
  • Use only when confident about the distribution
  • In practice, binary search is more reliable

Related Topics

Interpolation search is an example of an adaptive algorithm.

  • Binary Search — A more reliable alternative
  • Exponential Search — For unbounded arrays

Связанные уроки

  • alg-10-binary-search — Interpolation search replaces the fixed midpoint with a smart formula - O(log log n) instead of O(log n) on uniform data
  • alg-09-linear-search — On non-uniform data interpolation degrades to O(n) - worse than binary, so it's useful to understand all three algorithms
  • ds-11-bst — B-tree indexes in databases use a similar idea: knowing the key distribution lets you predict the position
  • stat-09-regression
Interpolation Search: Smarter Than Binary

0

1

Sign In