Qdrant - Vector Database

HNSW: How the Index Works

Search over 1M vectors in 1-5ms - that's not magic. That's HNSW: a graph where any vector is reachable in O(log N) steps. Understanding its parameters means knowing how to tune recall and speed for a given task.

**RAG with high recall:** ef_construct=200, hnsw_ef=128 - won't miss relevant context for the LLM
**Real-time recommendations:** m=16, hnsw_ef=32 - 1-2ms per query, 94% recall is sufficient
**Medical data:** m=32, exact: true for final verification - accuracy beats speed

Предварительные знания

First Search: Search API

HNSW: Navigable Small World

**HNSW (Hierarchical Navigable Small World)** is an Approximate Nearest Neighbor (ANN) algorithm. It's a multi-layer graph: upper layers are 'express highways' with few nodes and long links; the bottom layer is a full graph containing all vectors with short links.

**'Small world'** is a concept from sociology: any two people are connected through ~6 handshakes. HNSW applies the same principle - from any point, any other can be reached quickly by following 'long' links in upper layers and refining through 'short' links in lower layers.

**HNSW is ANN, not exact search.** It finds approximate nearest neighbors, not guaranteed exact ones. Recall (fraction of correct results) is typically 95-99%, which is enough for most tasks. For 100% recall - use `exact: true`, but that's brute-force.

An HNSW search returns 9 out of 10 correct results (recall = 90%). A colleague suggests switching to `exact: true` for 100% recall in production. What is the right response?

Parameters: m, ef_construct, ef

**Three key HNSW parameters** determine the quality and speed of the index. Understanding each is essential.

Parameter	Where set	Default	What it controls
m	hnsw_config at collection creation	16	Number of links per node in the graph. Higher = better recall, more memory
ef_construct	hnsw_config at collection creation	100	Queue size when building the index. Higher = better graph, slower indexing
ef (hnsw_ef)	params at search time	depends on ef_construct	Queue size during search. Higher = better recall, slower queries

**ef_construct cannot be changed after collection creation** - the collection must be recreated. The `m` parameter is also fixed. **hnsw_ef at search time** - can be changed per query. That's why starting with moderate m/ef_construct and tuning quality through hnsw_ef is the recommended approach.

A collection is created with m=16, ef_construct=100. Search recall is 94%, and 98% is the target. What is the correct next step?

Tuning: memory, speed, quality

**The main HNSW trade-off:** recall quality ↔ search speed ↔ memory usage. Three corners of a triangle - improving one usually hurts another.

Scenario	m	ef_construct	hnsw_ef	Characteristics
Fast search (latency-first)	8	50	32	Minimum memory, maximum speed, recall ~90%
Balanced (default)	16	100	64	Good recall ~97%, moderate speed and memory
High recall	32	200	128	Recall ~99%, +2x memory, moderately slower
Maximum quality	64	400	256	Recall ~99.9%, expensive on memory and speed

**Practical advice:** start with defaults m=16, ef_construct=100. Measure recall on real data by comparing against `exact: true`. If recall < 95% - first try increasing hnsw_ef at search time. If even hnsw_ef=512 doesn't help - recreate with m=32, ef_construct=200.

Setting maximum m=64, ef_construct=400 'just to be safe'

Start with defaults (m=16, ef_construct=100) and increase only when measured recall is insufficient

m=64 vs m=16: graph memory grows 4x. ef_construct=400 vs 100: indexing slows ~4x. Yet recall on typical tasks with defaults is already 95-97%. Oversized parameters are pure resource waste

A collection has 5M vectors (1536-dim). Measured recall@10 = 96% at hnsw_ef=64. The product requires 99%+. What is the optimal approach?

Key Ideas

**HNSW** - multi-layer graph, search in O(log N). Upper layers = fast approximation, bottom layer = precise refinement
**m** (default: 16) - node connectivity in the graph. Higher = better recall, more memory. Fixed at creation
**ef_construct** (default: 100) - graph build quality. Higher = better graph, slower indexing. Fixed at creation
**hnsw_ef** at search time - can change per query. This is the main recall tuning lever without recreating the collection
**Tuning order:** first increase hnsw_ef, then m and ef_construct. Measure recall by comparing against exact: true

What's next

HNSW handles vector search. But filtering by data fields - category, date, geolocation - is also often required.

Payload Indexes — Speeding up payload field filtering - works alongside HNSW
Vector Quantization — 4-32x memory compression with minimal recall loss
Distance Metrics — How to choose the right metric for a given task

Вопросы для размышления

In what tasks is 95% recall sufficient, and in which is 99%+ required? Give examples from real systems.
Why can't ef_construct be changed after collection creation? How does this relate to how HNSW works?
When RAM is constrained - which parameter should be reduced first: m or ef_construct?

Связанные уроки

alg-12-bfs