Algorithms

Interpolation Search: Smarter Than Binary

Цели урока

Understand the idea of interpolation in search
Implement interpolation search
Know when it is better/worse than binary search

Предварительные знания

Binary search

Binary Search

When you look up a word in a dictionary, you don't open it in the middle. You estimate roughly where the word should be. Interpolation search does the same thing - and sometimes beats binary search by 6x!

Search by timestamp in time-series databases
Indexes with numeric keys
Scientific data (measurements at equal intervals)

William Peterson and interpolation search at IBM

Interpolation search was described by William Wesley Peterson in 1957, in his paper Addressing for Random-Access Storage in the IBM Journal of Research and Development. Peterson worked on storing data on disks and drums, where reaching an arbitrary record was expensive, and he wanted a way to guess a key's location more accurately than plain halving. The idea itself came from ledgers and tables: when values are spread evenly, the key itself indicates roughly where to look, the way a word is found in a dictionary. Peterson showed that for uniformly distributed data this kind of search needs on the order of log log n probes on average instead of binary search's log n. Theorists later sharpened the analysis: Yao and Yao proved in 1976 that O(log log n) is the asymptotically optimal bound for searching uniform data. The method comes with a hard caveat, though: on skewed or adversarial distributions interpolation search degrades to O(n), worse than dependable binary search. That is why it rarely shows up in production libraries, yet its core idea came back directly in the learned indexes of 2018, where a trained model predicts a key's position.

Idea: Guess the Position

**Consider a phone book:** searching for 'Patterson' means opening it about 2/3 of the way in, not at the middle. Why? Because 'P' is closer to the end of the alphabet.

**Interpolation search** does the same thing - it guesses the position based on the value:

**Formula:** assumes a linear distribution and calculates where target should be.

For arr = [100, 200, 300, 400, 500], target = 350. Interpolation search starts at:

Implementation

**Important checks:** `arr[left] <= target <= arr[right]` and division by zero when `arr[left] == arr[right]`.

Why is the check `arr[left] <= target <= arr[right]` needed?

Complexity Analysis

Case	Interpolation	Binary
Best (uniform)	O(1)	O(1)
Average (uniform)	O(log log n)	O(log n)
Worst	O(n)	O(log n)

**Why O(log log n)?**

**Worst case O(n):** if data is distributed non-uniformly (e.g., [1, 2, 3, 1000000]), interpolation makes poor guesses, and each step only advances by 1.

On what data does interpolation search degrade to O(n)?

When to Use

**Use interpolation search when:**

Data is **uniformly distributed** (or close to it)
Distribution is **known with confidence** (no outliers)
n is very large (the difference between log log n and log n matters)
Comparison is cheap, but arithmetic is not a problem

**Use binary search when:**

Distribution is **unknown or non-uniform**
There are **outliers**
A **guaranteed O(log n) complexity** is required
Code should be **simple and reliable**

**In practice**, binary search is used in 99% of cases. Interpolation search is exotic - for specific tasks with large, uniform data.

For searching logs by timestamp (which grow uniformly), better to use:

Interpolation Search

Guesses position based on value
O(log log n) for uniform data
O(n) for non-uniform data - worse than binary!
Use only when confident about the distribution
In practice, binary search is more reliable

Связанные уроки

alg-10-binary-search — Interpolation search replaces the fixed midpoint with a smart formula - O(log log n) instead of O(log n) on uniform data
alg-09-linear-search — On non-uniform data interpolation degrades to O(n) - worse than binary, so it's useful to understand all three algorithms
ds-11-bst — B-tree indexes in databases use a similar idea: knowing the key distribution lets you predict the position
stat-09-regression

Idea: Guess the Position

**Consider a phone book:** searching for 'Patterson' means opening it about 2/3 of the way in, not at the middle. Why? Because 'P' is closer to the end of the alphabet.

**Interpolation search** does the same thing - it guesses the position based on the value:

**Formula:** assumes a linear distribution and calculates where target should be.

For arr = [100, 200, 300, 400, 500], target = 350. Interpolation search starts at:

Complexity Analysis

Case

Interpolation

Binary

Best (uniform)

O(1)

Average (uniform)

O(log log n)

O(log n)

Worst

O(n)

O(log n)

**Why O(log log n)?**

**Worst case O(n):** if data is distributed non-uniformly (e.g., [1, 2, 3, 1000000]), interpolation makes poor guesses, and each step only advances by 1.

On what data does interpolation search degrade to O(n)?

When to Use

**Use interpolation search when:**

Data is **uniformly distributed** (or close to it)

Distribution is **known with confidence** (no outliers)

n is very large (the difference between log log n and log n matters)

Comparison is cheap, but arithmetic is not a problem

**Use binary search when:**

Distribution is **unknown or non-uniform**

There are **outliers**

A **guaranteed O(log n) complexity** is required

Code should be **simple and reliable**

**In practice**, binary search is used in 99% of cases. Interpolation search is exotic - for specific tasks with large, uniform data.

For searching logs by timestamp (which grow uniformly), better to use:

Связанные уроки

alg-10-binary-search — Interpolation search replaces the fixed midpoint with a smart formula - O(log log n) instead of O(log n) on uniform data

alg-09-linear-search — On non-uniform data interpolation degrades to O(n) - worse than binary, so it's useful to understand all three algorithms

ds-11-bst — B-tree indexes in databases use a similar idea: knowing the key distribution lets you predict the position

stat-09-regression

Interpolation Search: Smarter Than Binary

Цели урока

Предварительные знания

William Peterson and interpolation search at IBM

Idea: Guess the Position

Implementation

Complexity Analysis

When to Use

Interpolation Search

Related Topics

Связанные уроки

Interpolation Search: Smarter Than Binary

Цели урока

Предварительные знания

William Peterson and interpolation search at IBM

Idea: Guess the Position

Implementation

Complexity Analysis

When to Use

Interpolation Search

Related Topics

Связанные уроки