Qdrant - Vector Database

Qdrant: What It Is and Why You Need It

Midjourney generates 15 million images a day. Each one is a vector of 1024 float32 values. Finding 'similar' among 100 million in 10 ms - that's not `LIKE '%query%'`. That's HNSW (Hierarchical Navigable Small World). Qdrant is written in Rust, handles 100k+ RPS, and payload filtering runs at the HNSW traversal level. Pinecone charges $0.096/h per managed pod. Qdrant self-hosted is free. For a busy service, that gap runs into thousands of dollars a month.

**Notion AI Search** - 10M+ users search notes by meaning, not keywords; per-workspace filtering at the HNSW level
**GitHub Copilot** - code embeddings at dim=1536, finding similar code in milliseconds across a full codebase
**Spotify** - ANN over 100 million tracks for song recommendations; embedding + payload filter by genre and market
**Cloudflare WAF** - vector embeddings for HTTP traffic anomaly detection in real time

Why ordinary databases fall short

Midjourney generates 15 million images a day. Each one is a vector of 1024 float32 values. Finding 'similar' among 100 million in 10 ms - that's not `LIKE '%query%'`. SQL has WHERE, LIKE, full-text search. Semantic similarity is simply beyond reach.

**The solution: embeddings.** A language model converts text into a numeric vector of 1536 float32 values - where semantic similarity equals geometric proximity. Documents about machine learning cluster together in this 1536-dimensional space, regardless of the exact words used. GitHub Copilot uses dim=1536 for exactly this: finding similar code by meaning, not by syntax.

**The scale problem.** For 1 million documents, brute-force search (comparing all pairs) requires 1.5 billion operations per query - 5 seconds at just 1 RPS. A vector database solves this in 5-20 ms via a specialized index. Spotify runs ANN over 100 million tracks for song recommendations exactly this way.

Approach	100K documents	1M documents	10M documents
Brute-force	~500ms	~5s	~50s
HNSW (vector DB)	~5ms	~15ms	~30ms
Speedup	100x	333x	1666x

**A vector database** is not a replacement for PostgreSQL - it's a specialized tool for one job: storing vectors and finding the nearest ones in milliseconds. It's often used alongside Postgres: primary data there, embeddings in Qdrant.

A vector database replaces PostgreSQL

A vector database complements a relational DB - it doesn't replace it

Qdrant has no transactions, no JOINs, no constraint checks. The standard architecture: core data in Postgres, embeddings in Qdrant, IDs linking them together. Uber uses exactly this pattern: embedding + payload filter by city, restaurant data in the relational DB.

Why is full-text search (LIKE, tsvector) unsuitable for semantic search?

Qdrant Architecture: from point to cluster

**Qdrant** is an open-source vector database written in Rust (2021). Rust is not marketing - it means no GC pauses, no GIL, predictable p99 latency. Qdrant's HNSW index is 3-5x faster than popular Python implementations at comparable recall. That's why Qdrant handles 100k+ RPS where a Python equivalent starts to choke.

**Core Qdrant concepts:**

Concept	SQL equivalent	Description
Collection	Table	Namespace for vectors of the same dimensionality
Point	Row	A unit of data: ID + vector + payload
Vector	Indexed column	Numeric array (embedding) used for search
Payload	JSON column	Arbitrary metadata: text, numbers, tags
Segment	Partition	Internal collection subdivision for parallelism
Shard	Shard	Unit of distribution in a cluster

**Segments** are the key idea inside Qdrant. A collection is split into multiple segments, each with its own HNSW index. Writes go to the active segment, reads come from indexed segments. The optimizer rebuilds indexes in the background. No global locks - that's why Qdrant can be written to and queried simultaneously without latency spikes.

Payload filtering in Qdrant operates at the HNSW traversal level - not after. This means a search for 'documents similar to this one in category legal' walks only the relevant nodes in the graph. It doesn't filter 100 candidates at the end. That's a fundamental difference in recall for narrow filters.

Segments in Qdrant are the same thing as shards in a cluster

Segments are inside a single node for parallelism; shards are the unit of distribution across nodes

Segments live within one Qdrant instance: one segment writes, another reads. Shards are the cluster level: different shards live on different machines. Confusing them means misdiagnosing latency and capacity issues.

Why does Qdrant split a collection into segments?

Qdrant vs alternatives: choosing a vector database

Pinecone charges $0.096/h per managed pod. Qdrant self-hosted is free. At 10 pods for a busy RAG service, that gap is thousands of dollars a month. Choosing a vector database is not just a technical decision.

Database	Language	Filtering	License	When to choose
Qdrant	Rust	No recall degradation	Apache 2.0	Production, complex filters, self-hosted
Pinecone	Closed	Metadata filtering	SaaS only	Managed, quick start, no DevOps
Weaviate	Go	GraphQL filters	BSD-3	GraphQL API, schemas, modular
pgvector	C	Full SQL	PostgreSQL	Already on Postgres, <1M vectors
ChromaDB	Python	Basic	Apache 2.0	Prototypes, local development
Milvus	Go/C++	Attribute filtering	Apache 2.0	Billions of vectors, Kubernetes

**Qdrant's key advantage: filtered search without recall compromise.** Most vector databases degrade quality when filters are applied - the index covers all data, results are filtered post-hoc. Qdrant uses filterable HNSW: the search only walks nodes that pass the filter. Notion AI with 10M+ users applies this for per-workspace filtering.

**pgvector** is a solid starting point: under 500K vectors, Postgres already in the stack, no separate DevOps needed. For production with millions of documents, complex filters, and latency requirements under 20ms - Qdrant. Cloudflare WAF uses vector embeddings for anomaly detection at a scale where pgvector simply isn't an option.

**Who uses Qdrant in production?** Microsoft (Semantic Kernel), Notion (AI search), Dust.tt, and dozens of YC startups. Qdrant Cloud is the managed version with EU-hosted servers (important for GDPR).

Qdrant is faster than competitors only because of Rust

The speed comes from the filterable HNSW algorithm, not just the language

Rust gives predictable latency without GC pauses. But the main advantage is payload filtering built into HNSW traversal. With a narrow filter (2% of data), post-filtering drops recall to 30-50%. Qdrant maintains recall regardless of filter selectivity - that's algorithmic, not hardware.

Your app searches for similar documents only in the 'legal' category out of 50 possible categories. Why will Qdrant handle this better than a competitor using post-filtering?

Key Ideas

**SQL doesn't understand meaning** - LIKE and tsvector need exact words; embeddings encode meaning as numbers, so semantically similar texts land near each other in vector space
**HNSW is not just faster - it's a different order** - brute-force over 1M documents takes 5 seconds, HNSW takes 15 ms; the gap grows with scale
**Qdrant = Point (ID + vector + payload)** - the minimal unit; a collection holds thousands of these with an HNSW index on top
**Filterable HNSW** - filter applied before search, not after; recall doesn't drop with narrow filters (2% of data still gives accurate top-10)
**Pinecone vs Qdrant** - the difference isn't only technical: $0.096/h/pod vs free self-hosted at the same recall

What's next

Theory without practice is nothing. Next lesson: spin up Qdrant and make the first request.

Installation and First Run — Launch Docker, explore the Web UI, make the first REST request
Embeddings (AI Engineering) — Where the vectors we put into Qdrant actually come from

Вопросы для размышления

Which tasks in a typical project could benefit from semantic search instead of full-text search?
With 100K documents and PostgreSQL already in place, is switching to Qdrant worthwhile? What factors tip the balance?
Why is Rust an important architectural decision for a vector database, and not just marketing?

Связанные уроки

qd-02-install — Practice: install Qdrant and make the first request after the theory is clear.
ml-01-intro — Embeddings are the foundation of vector search; ML intuition helps understand the semantic space.
ir-01 — Information retrieval is the classical foundation that vector search extends from keyword matching to semantic similarity.
ds-01-intro — Segments and shards in Qdrant follow the same distributed systems architecture: independent nodes, HNSW indexes per segment.
alg-01-big-o — HNSW is O(log n) search in n-dimensional space; understanding algorithmic complexity explains the 1666x speedup over brute-force.
db-01-intro