Qdrant - Vector Database
HNSW: How the Index Works
Search over 1M vectors in 1-5ms - that's not magic. That's HNSW: a graph where any vector is reachable in O(log N) steps. Understanding its parameters means knowing how to tune recall and speed for a given task.
- **RAG with high recall:** ef_construct=200, hnsw_ef=128 - won't miss relevant context for the LLM
- **Real-time recommendations:** m=16, hnsw_ef=32 - 1-2ms per query, 94% recall is sufficient
- **Medical data:** m=32, exact: true for final verification - accuracy beats speed
Предварительные знания
HNSW: Navigable Small World
**HNSW (Hierarchical Navigable Small World)** is an Approximate Nearest Neighbor (ANN) algorithm. It's a multi-layer graph: upper layers are 'express highways' with few nodes and long links; the bottom layer is a full graph containing all vectors with short links.
**'Small world'** is a concept from sociology: any two people are connected through ~6 handshakes. HNSW applies the same principle - from any point, any other can be reached quickly by following 'long' links in upper layers and refining through 'short' links in lower layers.
**HNSW is ANN, not exact search.** It finds approximate nearest neighbors, not guaranteed exact ones. Recall (fraction of correct results) is typically 95-99%, which is enough for most tasks. For 100% recall - use `exact: true`, but that's brute-force.
An HNSW search returns 9 out of 10 correct results (recall = 90%). A colleague suggests switching to `exact: true` for 100% recall in production. What is the right response?
Parameters: m, ef_construct, ef
**Three key HNSW parameters** determine the quality and speed of the index. Understanding each is essential.
| Parameter | Where set | Default | What it controls |
|---|---|---|---|
| m | hnsw_config at collection creation | 16 | Number of links per node in the graph. Higher = better recall, more memory |
| ef_construct | hnsw_config at collection creation | 100 | Queue size when building the index. Higher = better graph, slower indexing |
| ef (hnsw_ef) | params at search time | depends on ef_construct | Queue size during search. Higher = better recall, slower queries |
**ef_construct cannot be changed after collection creation** - the collection must be recreated. The `m` parameter is also fixed. **hnsw_ef at search time** - can be changed per query. That's why starting with moderate m/ef_construct and tuning quality through hnsw_ef is the recommended approach.
A collection is created with m=16, ef_construct=100. Search recall is 94%, and 98% is the target. What is the correct next step?
Tuning: memory, speed, quality
**The main HNSW trade-off:** recall quality ↔ search speed ↔ memory usage. Three corners of a triangle - improving one usually hurts another.
| Scenario | m | ef_construct | hnsw_ef | Characteristics |
|---|---|---|---|---|
| Fast search (latency-first) | 8 | 50 | 32 | Minimum memory, maximum speed, recall ~90% |
| Balanced (default) | 16 | 100 | 64 | Good recall ~97%, moderate speed and memory |
| High recall | 32 | 200 | 128 | Recall ~99%, +2x memory, moderately slower |
| Maximum quality | 64 | 400 | 256 | Recall ~99.9%, expensive on memory and speed |
**Practical advice:** start with defaults m=16, ef_construct=100. Measure recall on real data by comparing against `exact: true`. If recall < 95% - first try increasing hnsw_ef at search time. If even hnsw_ef=512 doesn't help - recreate with m=32, ef_construct=200.
Setting maximum m=64, ef_construct=400 'just to be safe'
Start with defaults (m=16, ef_construct=100) and increase only when measured recall is insufficient
m=64 vs m=16: graph memory grows 4x. ef_construct=400 vs 100: indexing slows ~4x. Yet recall on typical tasks with defaults is already 95-97%. Oversized parameters are pure resource waste
A collection has 5M vectors (1536-dim). Measured recall@10 = 96% at hnsw_ef=64. The product requires 99%+. What is the optimal approach?
Key Ideas
- **HNSW** - multi-layer graph, search in O(log N). Upper layers = fast approximation, bottom layer = precise refinement
- **m** (default: 16) - node connectivity in the graph. Higher = better recall, more memory. Fixed at creation
- **ef_construct** (default: 100) - graph build quality. Higher = better graph, slower indexing. Fixed at creation
- **hnsw_ef** at search time - can change per query. This is the main recall tuning lever without recreating the collection
- **Tuning order:** first increase hnsw_ef, then m and ef_construct. Measure recall by comparing against exact: true
What's next
HNSW handles vector search. But filtering by data fields - category, date, geolocation - is also often required.
- Payload Indexes — Speeding up payload field filtering - works alongside HNSW
- Vector Quantization — 4-32x memory compression with minimal recall loss
- Distance Metrics — How to choose the right metric for a given task
Вопросы для размышления
- In what tasks is 95% recall sufficient, and in which is 99%+ required? Give examples from real systems.
- Why can't ef_construct be changed after collection creation? How does this relate to how HNSW works?
- When RAM is constrained - which parameter should be reduced first: m or ef_construct?