Qdrant - Vector Database
Native BM25 and Full-Text Search
Running Elasticsearch for keyword search and Qdrant for semantic search? Two databases, double the deployment, constant synchronization. With native BM25 in Qdrant - it's one service.
- Replacing Elasticsearch + vector DB with a single Qdrant instance in a RAG system
- Autocomplete with prefix tokenizer without a separate service
- Multilingual search with Unicode-aware tokenization for 100+ languages
Предварительные знания
Native BM25: Qdrant Computes It Itself
Qdrant v1.16+ introduced **native BM25**: Qdrant computes sparse BM25 vectors from text payload directly, without external libraries like FastEmbed or Elasticsearch. Simply create a `text` index on a payload field - and the collection gains lexical search capability.
The difference between a **text filter** and a **BM25 sparse index**: - `match: {text: 'query'}` - a filter (boolean: contains or not), no ranking - BM25 sparse vector - ranking by TF-IDF weights with IDF computed across the whole collection Both mechanisms use a text index, but they produce different results.
| Tokenizer | Description | When to use |
|---|---|---|
| word | Split by words (spaces, punctuation) | Standard English text |
| whitespace | Split on spaces only, no punctuation splitting | Code, URLs, IDs |
| prefix | word + all prefixes of each token | Autocomplete, prefix search |
| multilingual | Unicode-aware tokenizer for multilingual text | Russian, Arabic, CJK languages |
How does native BM25 in Qdrant differ from a text filter using `match: {text: 'query'}`?
Full-Text Search and Tokenization
After creating a text index, Qdrant supports two types of text search: **filter-based** (via payload filter) and **BM25 ranking** (via sparse vector query). Let's look at both and learn how to combine them.
**ASCII folding** and **stemming**: the text tokenizer with `lowercase: true` automatically normalizes casing. For ASCII folding (é → e, ñ → n) and stemming, use the `multilingual` tokenizer. The prefix tokenizer creates sub-tokens: the word `hello` → tokens `h`, `he`, `hel`, `hell`, `hello` - enabling prefix search without wildcard queries.
For autocomplete, use the `prefix` tokenizer with a filter rather than a BM25 query. The prefix filter is faster and more precise for this use case: `{key: 'title', match: {text: 'mach'}}` will find all documents starting with 'mach'.
Which tokenizer is best suited for autocomplete functionality?
Full Hybrid Search Without Elasticsearch
The traditional hybrid search architecture: **Elasticsearch** (lexical) + **vector DB** (semantic). With native BM25 in Qdrant, the entire system fits into one service. This simplifies deployment, reduces latency, and eliminates the data synchronization problem between two databases.
**Native BM25 vs FastEmbed BM42**: with native BM25, no external embedder is needed for the sparse part - Qdrant computes everything from the payload. FastEmbed BM42 generates sparse vectors on the client (attention-weighted). For most cases, native BM25 is simpler and sufficiently accurate. BM42 offers better quality but requires a separate embedder service.
Do you need to manually compute the BM25 sparse vector before upserting when using native BM25 in Qdrant v1.16+?
Summary
- Native BM25 in Qdrant v1.16+: create a text payload index - the sparse vector is computed automatically
- Text filter (`match: {text}`) - boolean match; BM25 query - TF-IDF ranking
- 4 tokenizers: word (standard), whitespace (code), prefix (autocomplete), multilingual (Unicode)
- Full hybrid search = dense (semantic) + BM25 (lexical) + RRF fusion - all in one Qdrant instance
- Native BM25 is simpler to deploy than FastEmbed BM42; BM42 gives better quality at the cost of an external embedder
What's Next
Native BM25 covers the lexical side. Now let's add advanced exploration on top.
- Discovery Search — Exploration on top of hybrid search: context-guided retrieval
- FastEmbed BM42 — Alternative to native BM25: attention-weighted sparse embeddings
- Payload Index — Fundamentals: how payload indexes work in Qdrant
Вопросы для размышления
- In what scenario would you keep Elasticsearch alongside Qdrant rather than replacing it?
- How would you implement multilingual autocomplete (EN + FR) with minimal resources?
- What would change in your RAG system architecture if you switched from Elasticsearch + Qdrant to a single Qdrant with native BM25?