Qdrant - Vector Database

ACORN, Inline Storage, and Gridstore

Qdrant v1.16-17 is not just new API features. These are fundamental changes to how data is stored and how the graph is traversed. Understanding these changes lets you squeeze maximum performance out of your hardware.

Multi-tenant RAG: 10M documents, 100K users - ACORN keeps recall on target
On-disk collection on NVMe: Inline Storage cuts disk reads by 16x
Real-time indexing: Gridstore eliminates latency spikes from RocksDB compaction

Предварительные знания

ACORN: Restoring Recall Under Restrictive Filters

The classic HNSW-with-filters problem: when a filter is very restrictive (e.g., `user_id = 'alice'`), most graph neighbors are filtered out. HNSW starts to stall - there are no transitions to the needed points, and recall drops sharply to 60-70%.

**ACORN** (Approximate Closest-to-Object Reachability Navigation), added in Qdrant v1.17, solves this problem. The idea: when a direct neighbor in the graph is filtered out, ACORN checks **2-hop neighbors** - the neighbors of neighbors. This significantly expands the set of reachable points using the same HNSW graph structure.

Scenario	Without ACORN	With ACORN	Overhead
No filter	~99% recall	~99% recall	0%
Soft filter (50% of points)	~95% recall	~98% recall	+10-20%
Tight filter (1% of points)	~60% recall	~95% recall	+2-10x
Very tight filter (0.1% of points)	~30% recall	~90% recall	+5-20x

ACORN is significantly slower under very restrictive filters - 5-20x compared to unfiltered search. This is a recall vs latency trade-off. For critically latency-sensitive cases, consider a **tenant-based** architecture with separate collections per user.

How does ACORN restore recall under restrictive filters in HNSW?

Inline Storage: Vectors Inside Graph Nodes

Before Qdrant v1.16, searching an on-disk collection required **two operations** per HNSW node visit: 1. Reading the graph structure (neighbor links) - one random I/O 2. Reading the point's vector from a separate file - another random I/O For a deep HNSW search (ef=128), this means ~256 random disk reads. On an HDD - catastrophic. On NVMe SSD - still slow.

**Inline Storage** (v1.16+): quantized vectors are stored directly inside HNSW graph nodes, in the same data block. Reading a node = reading both the graph structure and the vector in one operation. For memmap (on-disk) search: was 32 random reads → now 2 page reads.

**When Inline Storage gives maximum benefit:** - on_disk collections (vectors on disk + graph on disk) - Collections larger than server RAM - NVMe SSD (not HDD - latency is too high there even with Inline) - Quantization enabled (int8 or binary)

In which scenario does Inline Storage provide the greatest performance improvement?

Gridstore: Replacing RocksDB for Payload and Sparse

Before v1.16, Qdrant used **RocksDB** for payload and sparse vector storage. RocksDB is an LSM-tree: excellent for read-heavy workloads, but it has write amplification (each byte is written multiple times during compaction) and unpredictable latency spikes during compaction.

**Gridstore** (v1.16+) is a custom storage engine from the Qdrant team, replacing RocksDB. Key benefits: - **Less write amplification**: no LSM compaction - **Stable latency**: no latency spikes from background tasks - **Better scan**: sequential access during scroll/filter is faster - **Less CPU**: no background compaction threads

**When to use all three together** (ACORN + Inline Storage + Gridstore): This is the recommended configuration for **production large-scale deployment** on Qdrant v1.17+: - Large collection (>10M points), does not fit in RAM - Multi-tenant architecture with per-user tenant filters - Write-heavy: continuous upsert of new documents - Recall requirements ≥95% even under restrictive filters

What main problem with RocksDB does Gridstore solve?

Summary

ACORN (v1.17+): 2-hop HNSW navigation restores recall from ~60% to ~95% under restrictive filters
Inline Storage (v1.16+): quantized vector stored inside the graph node - disk reads during on-disk search drop by 16x
Gridstore (v1.16+): RocksDB replacement without LSM compaction - stable latency, less write amplification
Higher m in HNSW (32 instead of 16) improves ACORN - more 2-hop paths through the graph
All three together - the recommended configuration for large-scale on-disk multi-tenant collections

What's Next

You have completed the full advanced Qdrant course. Here are directions for further study.

Performance Tuning — Practical parameters for tuning Qdrant under load
Quantization — Deep understanding of quantization - the foundation for Inline Storage
HNSW — HNSW algorithm details - the foundation for understanding ACORN

Вопросы для размышления

At what collection size would you enable on_disk for vectors? What memmap_threshold would you choose?
How would you measure the actual performance gain from Inline Storage in your system? What metrics would you watch?
If your service grew rapidly from 1M to 50M documents - in what order would you apply ACORN, Inline Storage, and Gridstore?

Связанные уроки

db-09-indexes-btree

ACORN: Restoring Recall Under Restrictive Filters

Scenario

Without ACORN

With ACORN

Overhead

No filter

~99% recall

Soft filter (50% of points)

~95% recall

~98% recall

+10-20%

Tight filter (1% of points)

~60% recall

~95% recall

+2-10x

Very tight filter (0.1% of points)

~30% recall

~90% recall

+5-20x

How does ACORN restore recall under restrictive filters in HNSW?

Inline Storage: Vectors Inside Graph Nodes

In which scenario does Inline Storage provide the greatest performance improvement?

Gridstore: Replacing RocksDB for Payload and Sparse

What main problem with RocksDB does Gridstore solve?

Summary

ACORN (v1.17+): 2-hop HNSW navigation restores recall from ~60% to ~95% under restrictive filters

Inline Storage (v1.16+): quantized vector stored inside the graph node - disk reads during on-disk search drop by 16x

Gridstore (v1.16+): RocksDB replacement without LSM compaction - stable latency, less write amplification

Higher m in HNSW (32 instead of 16) improves ACORN - more 2-hop paths through the graph

All three together - the recommended configuration for large-scale on-disk multi-tenant collections