Qdrant - Vector Database

First Search: Search API

The data is indexed. Now comes the interesting part - search in 10-15ms across millions of vectors. But search with wrong parameters can return irrelevant results. The goal: making it accurate.

**RAG (Retrieval-Augmented Generation):** finding context for an LLM - score_threshold 0.75 cuts out noise
**Semantic search:** like Elasticsearch, but by meaning - limit: 10, score_threshold: 0.7
**Recommendations:** 'similar articles' - search by the vector of an existing document

Предварительные знания

Points, Vectors, Payloads

Search API: the basic request

**Searching in Qdrant** means finding the K nearest points to a query vector. The query vector is the embedding of the search input (text, image, anything). Qdrant returns points sorted by descending score (similarity).

**Score** is a similarity measure that depends on the chosen metric. For Cosine: ranges from -1 to 1, where 1 = identical vectors. In practice, similar documents score 0.85-0.99, unrelated ones score < 0.5.

**Use the same embedding model** for indexing and searching. If documents were indexed with `text-embedding-3-small`, search queries must go through that same model. Mixing models is not allowed - the vector spaces are incompatible.

Documents were indexed using OpenAI text-embedding-3-small (1536d). For search, the cheaper ada-002 model (also 1536d) is to be used. Will this work?

Distance metrics: Cosine, Dot, Euclid

**The distance metric** determines how similarity between two vectors is calculated. It's chosen at collection creation and cannot be changed. Different embedding models are optimized for different metrics.

Metric	Formula	Score range	When to use
Cosine	cos(a,b) = a·b / (\|a\|·\|b\|)	[-1, 1], higher = better	Text embeddings (OpenAI, Cohere, Sentence Transformers)
Dot Product	a·b = Σ(aᵢ × bᵢ)	[-∞, +∞], higher = better	Normalized vectors, ColBERT, SPLADE
Euclid	√Σ(aᵢ - bᵢ)²	[0, +∞], lower = better	Coordinates, physical measurements, specialized models
Manhattan	Σ\|aᵢ - bᵢ\|	[0, +∞], lower = better	Rarely used in NLP, specialized tasks

**Cosine vs Dot Product:** if vectors are normalized (|v|=1), they produce the same ranking. OpenAI embeddings are normalized, so both work. Cosine is more intuitive (it's an angle), Dot Product is slightly faster to compute (no division by norms).

OpenAI text-embedding-3-small is being used. The docs say embeddings are normalized (|v|=1). Which metrics will give the same ranking?

Search parameters: limit, threshold, hnsw_ef

**Search parameters** allow balancing accuracy, speed, and the number of results returned.

**`score_threshold`** is a critical production parameter. Without it, Qdrant returns top-K results even if they're completely unrelated to the query. With a threshold - only genuinely relevant results.

**How to pick score_threshold:** run 20-30 test queries, look at the scores of correct and incorrect results. The boundary between them is the threshold. It tends to be stable for a specific model + data domain.

In a RAG app, search always returns 5 results - even for irrelevant queries like 'what day is today?'. What is the fix?

Results: payload, vectors, and pagination

**Managing results** - what gets returned in the response, how to paginate, and how to search by specific payload fields.

**`with_payload: {include: ['field1', 'field2']}`** - recommended in production. Large payloads increase response size. When only `title` and `url` are needed, don't request everything else.

Using large offset for pagination: `offset: 1000, limit: 10`

Deep pagination in vector search has no efficient solution. Better to return more results upfront (top-50) and paginate on the client side

With offset=1000, Qdrant computes top-1010 and returns the last 10 - that's O(offset × log N). Vector search is not designed for deep pagination. Alternative: scroll() for collection traversal, or limit pagination to 2-3 pages

Ten semantic searches must be run in parallel in a single request. Which method should be used?

Key Ideas

**search(collection, {vector, limit})** - the basic API. Query vector from the same model used during indexing
**Score** - similarity measure. For Cosine: 1=identical, >0.8=very similar, <0.5=irrelevant
**score_threshold** - mandatory in production. Without it, search returns K results even for irrelevant queries
**Metrics:** Cosine/Dot for text embeddings, Euclid for coordinates. Always follow the model's documentation
**hnsw_ef** - accuracy vs speed. Default (64) is a good balance. `exact: true` only for testing
Remember the hook? 10-15ms at millions of vectors - that's HNSW + the right parameters

What's next

Basic search is working. Next level: understand how HNSW achieves this in milliseconds.

HNSW - how the index works — Understand m, ef_construct, ef - tune search quality
Payload filtering — must/should/must_not filters - find documents with the right metadata
Hybrid Search — Combining semantic and keyword search for better results

Вопросы для размышления

How should a developer choose score_threshold for a specific data domain? What experiment would determine the right value?
If search returns 0 results with threshold 0.8 but 10 with threshold 0.6 - what does that reveal about embedding quality or data distribution?
Where is Qdrant semantic search better than Elasticsearch full-text search? Where is it worse?

Связанные уроки

alg-10-binary-search