Data Structures

Collision Resolution: Chaining vs Open Addressing

Цели урока

  • Understand why collisions are inevitable
  • Master chaining
  • Study open addressing and probing strategies
  • Learn to choose the right method for the task

Предварительные знания

  • Hash tables: basics
  • Hash Tables: O(1) Access

Two guests arrive at a hotel and both are assigned room 42. What should the manager do? Kick one out? Put one in the hallway? Or come up with a smart system?

  • Python dict - open addressing in action
  • Java HashMap - chaining with trees
  • Redis - open addressing for speed
  • Databases - different strategies for indexes

Half a century of collision-resolution ideas

Separate chaining was already proposed by Hans Peter Luhn in the same 1953 IBM memo where he first described hashing. Open addressing with linear probing was worked out around 1954 by IBM engineers building the 701 machine, among them Gene Amdahl, Elaine Boehme, and Nathaniel Rochester. The idea is simple: if a slot is taken, walk forward through the table until an empty one turns up. Analyzing linear probing became a classic problem, studied in depth by Donald Knuth in Volume 3 of The Art of Computer Programming. Later schemes grew cleverer. In 1985 Pedro Celis introduced Robin Hood hashing, where an element far from its ideal slot displaces one that is closer, evening out probe lengths. In 2001 Rasmus Pagh and Flemming Rodler described cuckoo hashing with worst-case O(1) lookup: each key has two candidate slots, and on a conflict the resident is evicted to its alternate slot, much as a cuckoo pushes other eggs out of the nest.

Why Collisions Are Inevitable

**Pigeonhole Principle:** with 11 pigeons and 10 holes, at least one hole must contain 2 pigeons.

Similarly: if possible keys are infinite but table slots are finite - collisions are **guaranteed**.

**The question is not "will there be collisions?"** but **"how do we handle them?"**

Why are collisions in a hash table inevitable?

Chaining

**Idea:** each slot stores not one element but a **linked list** of all elements with the same hash.

  • Simple to implement
  • Deletion is trivial
  • The table never overflows
  • Downside: extra memory for pointers
  • Downside: poor cache locality (elements are scattered)

In chaining, what does each slot of the table store?

Open Addressing

**Idea:** if a slot is occupied - probe for the next free slot within the table itself.

No auxiliary structures - all data lives directly in the array.

  • Better cache locality (data is contiguous in memory)
  • Less memory (no pointers)
  • Downside: deletion is harder (requires DELETED tombstone)
  • Downside: clustering at high load factors

Open addressing is always better than chaining

Each method has its strengths: chaining is simpler and more stable, open addressing is more memory- and cache-efficient

Python dict uses open addressing, Java HashMap uses chaining. Both work successfully in production.

What is the key difference between open addressing and chaining?

Probing Strategies

How exactly do we find the next slot on a collision?

1. Linear Probing

2. Quadratic Probing

3. Double Hashing

What is the problem with linear probing?

Chaining vs Open Addressing: What to Choose?

CriterionChainingOpen Addressing
Load FactorCan exceed 1Must stay < 1
MemoryMore (pointers)Less
Cache localityPoorGood
DeletionSimpleComplex (tombstones)
ImplementationSimplerMore complex
At high LFStableDegrades sharply

**Python dict** uses open addressing with double hashing. **Java HashMap** uses chaining (switching to a tree for long chains).

**Practical rule:** for high-performance systems with controlled Load Factor, choose open addressing. For general use - chaining is simpler and more robust.

Which method does Python dict use?

Summary

  • Collisions are inevitable (Pigeonhole Principle)
  • Chaining: each slot is a list of elements
  • Open Addressing: probe for the next free slot
  • Linear < Quadratic < Double hashing in quality
  • Choice of method depends on memory and performance requirements

Next

Hash tables internals are now clear - time to apply them in practice!

  • Two Sum and Other Hits — Classic hash table problems

Вопросы для размышления

  • Under what conditions is chaining more effective than open addressing, and vice versa?
  • How does the choice of hash function affect collision distribution in real-world data?
  • Why is load factor a critical parameter when selecting a collision resolution strategy?

Связанные уроки

  • ds-06-hash-intro — Hash table basics before collision resolution
  • ds-08-hash-applications — Collision strategy choice affects application performance
  • ds-02-linked-lists — Chaining uses linked lists to store colliding elements
  • ds-01-arrays — Open addressing probes within the array itself
  • crypto-19-hash-functions
Collision Resolution: Chaining vs Open Addressing

0

1

Sign In