Qdrant - Vector Database

Native BM25 and Full-Text Search

Running Elasticsearch for keyword search and Qdrant for semantic search? Two databases, double the deployment, constant synchronization. With native BM25 in Qdrant - it's one service.

Replacing Elasticsearch + vector DB with a single Qdrant instance in a RAG system
Autocomplete with prefix tokenizer without a separate service
Multilingual search with Unicode-aware tokenization for 100+ languages

Предварительные знания

Native BM25: Qdrant Computes It Itself

Qdrant v1.16+ introduced **native BM25**: Qdrant computes sparse BM25 vectors from text payload directly, without external libraries like FastEmbed or Elasticsearch. Simply create a `text` index on a payload field - and the collection gains lexical search capability.

The difference between a **text filter** and a **BM25 sparse index**: - `match: {text: 'query'}` - a filter (boolean: contains or not), no ranking - BM25 sparse vector - ranking by TF-IDF weights with IDF computed across the whole collection Both mechanisms use a text index, but they produce different results.

Tokenizer	Description	When to use
word	Split by words (spaces, punctuation)	Standard English text
whitespace	Split on spaces only, no punctuation splitting	Code, URLs, IDs
prefix	word + all prefixes of each token	Autocomplete, prefix search
multilingual	Unicode-aware tokenizer for multilingual text	Russian, Arabic, CJK languages

How does native BM25 in Qdrant differ from a text filter using `match: {text: 'query'}`?

Full-Text Search and Tokenization

After creating a text index, Qdrant supports two types of text search: **filter-based** (via payload filter) and **BM25 ranking** (via sparse vector query). Let's look at both and learn how to combine them.

**ASCII folding** and **stemming**: the text tokenizer with `lowercase: true` automatically normalizes casing. For ASCII folding (é → e, ñ → n) and stemming, use the `multilingual` tokenizer. The prefix tokenizer creates sub-tokens: the word `hello` → tokens `h`, `he`, `hel`, `hell`, `hello` - enabling prefix search without wildcard queries.

For autocomplete, use the `prefix` tokenizer with a filter rather than a BM25 query. The prefix filter is faster and more precise for this use case: `{key: 'title', match: {text: 'mach'}}` will find all documents starting with 'mach'.

Which tokenizer is best suited for autocomplete functionality?

Full Hybrid Search Without Elasticsearch

The traditional hybrid search architecture: **Elasticsearch** (lexical) + **vector DB** (semantic). With native BM25 in Qdrant, the entire system fits into one service. This simplifies deployment, reduces latency, and eliminates the data synchronization problem between two databases.

**Native BM25 vs FastEmbed BM42**: with native BM25, no external embedder is needed for the sparse part - Qdrant computes everything from the payload. FastEmbed BM42 generates sparse vectors on the client (attention-weighted). For most cases, native BM25 is simpler and sufficiently accurate. BM42 offers better quality but requires a separate embedder service.

Do you need to manually compute the BM25 sparse vector before upserting when using native BM25 in Qdrant v1.16+?

Summary

Native BM25 in Qdrant v1.16+: create a text payload index - the sparse vector is computed automatically
Text filter (`match: {text}`) - boolean match; BM25 query - TF-IDF ranking
4 tokenizers: word (standard), whitespace (code), prefix (autocomplete), multilingual (Unicode)
Full hybrid search = dense (semantic) + BM25 (lexical) + RRF fusion - all in one Qdrant instance
Native BM25 is simpler to deploy than FastEmbed BM42; BM42 gives better quality at the cost of an external embedder

What's Next

Native BM25 covers the lexical side. Now let's add advanced exploration on top.

Discovery Search — Exploration on top of hybrid search: context-guided retrieval
FastEmbed BM42 — Alternative to native BM25: attention-weighted sparse embeddings
Payload Index — Fundamentals: how payload indexes work in Qdrant

Вопросы для размышления

In what scenario would you keep Elasticsearch alongside Qdrant rather than replacing it?
How would you implement multilingual autocomplete (EN + FR) with minimal resources?
What would change in your RAG system architecture if you switched from Elasticsearch + Qdrant to a single Qdrant with native BM25?

Связанные уроки

alg-10-binary-search

Native BM25: Qdrant Computes It Itself

Tokenizer

Description

When to use

word

Split by words (spaces, punctuation)

Standard English text

whitespace

Split on spaces only, no punctuation splitting

Code, URLs, IDs

prefix

word + all prefixes of each token

Autocomplete, prefix search

multilingual

Unicode-aware tokenizer for multilingual text

Russian, Arabic, CJK languages

How does native BM25 in Qdrant differ from a text filter using `match: {text: 'query'}`?

Full-Text Search and Tokenization

Which tokenizer is best suited for autocomplete functionality?

Full Hybrid Search Without Elasticsearch

Do you need to manually compute the BM25 sparse vector before upserting when using native BM25 in Qdrant v1.16+?

Summary

Native BM25 in Qdrant v1.16+: create a text payload index - the sparse vector is computed automatically

Text filter (`match: {text}`) - boolean match; BM25 query - TF-IDF ranking

4 tokenizers: word (standard), whitespace (code), prefix (autocomplete), multilingual (Unicode)

Full hybrid search = dense (semantic) + BM25 (lexical) + RRF fusion - all in one Qdrant instance

Native BM25 is simpler to deploy than FastEmbed BM42; BM42 gives better quality at the cost of an external embedder