Qdrant - Vector Database
Result Grouping
RAG without grouping is like a Google search where all top 10 results are from the same website. searchGroups fixes this elegantly: 'best 5 documents with best 2 chunks each' - one query, the right context.
- **RAG systems:** 4 documents × 3 chunks = 12 context fragments without duplication, with metadata
- **News search:** one article per outlet in the top results - diversity in results
- **Candidate search:** 1-2 jobs per company - no single company dominates the feed
Предварительные знания
The problem: many chunks from the same document
**Standard RAG pipeline:** a document is split into chunks → each chunk is indexed as a separate point. When searching, the top-10 results may contain 7 chunks from the same document. This is a poor user experience.
**Workarounds without groupBy:** 1. deduplication at the application level - but then you need limit=100 and discard 90 results 2. store whole documents - but you lose chunk-level precision. **searchGroups** solves this elegantly at the Qdrant level.
| Approach | Pros | Cons |
|---|---|---|
| Plain search + dedup in code | Simple implementation | High limit → extra work, unpredictable recall |
| Store only full documents (no chunks) | No duplication | Poor recall on long documents |
| searchGroups | Built-in deduplication, N best chunks per document | Requires a payload index on the group_by field |
A RAG system returns relevant results, but the LLM sees the same document 5 times out of 10. What does this cause?
searchGroups API: group_by and group_size
**searchGroups** is a special Qdrant method that groups results by a payload field value. It returns top-N documents, each with top-M chunks.
**Payload index is required!** The `group_by` field must be indexed. Create it with: `qdrant.createPayloadIndex('chunks', { field_name: 'document_id', field_schema: 'keyword' })`. Without an index - full scan = slow.
searchGroups with group_by='document_id', group_size=2, limit=5. The collection has 3 documents, each with 10 chunks. What is the maximum number of points returned?
lookup_from: enriching from another collection
**with_lookup** is a searchGroups parameter that automatically fetches payload for each group from another collection. This is the 'chunk collection + document collection' pattern.
**Why two collections?** Chunks contain only the chunk text. Full document metadata (title, URL, author, date, tags) lives in a separate document collection. When searching, Qdrant automatically enriches groups with data from the document collection.
**'documents' collection without a real vector:** if documents is only needed for lookup (not for search), you can create it with a minimal vector (size: 1, distance: 'Cosine') and use it purely as a metadata store. This is a valid pattern.
Why store document metadata (title, URL) in a separate 'documents' collection instead of in each chunk's payload?
RAG pattern: chunk search + document grouping
**Full RAG pipeline with searchGroups** - putting it all together: flexible chunks, group by document, metadata lookup, and building context for the LLM.
**group_size recommendations:** for RAG, group_size=2-3 gives more context than a single chunk. For a UI 'one result per document' use group_size=1. For maximum recall with deduplication - group_size=1 + limit=20.
Using plain search with limit=100 and code-side deduplication to save on searchGroups
searchGroups handles deduplication at the Qdrant level - more efficient with better recall
With limit=100 you transfer 100 points over the network and filter in code. searchGroups returns only the needed groups - less traffic, correct recall. On top of that, with limit=100 the top spots are still dominated by one document - you lose diversity
For RAG: group_by='document_id', group_size=1, limit=10 vs group_size=3, limit=4. Which is better with a limited LLM context window (4096 tokens)?
Key Ideas
- **searchGroups** solves the problem of duplicate chunks from one document in results
- **group_by** - payload field to group by; **group_size** - best chunks per group; **limit** - number of groups
- **with_lookup** - automatically enriches groups with data from another collection
- **Two-collection pattern:** chunks (search) + documents (metadata) - normalization and easy updates
- **Payload index** on the group_by field is required for performance
- Remember the hook? One query, N documents, M chunks - that's production RAG
What's next
Grouping is working. Next step - finding 'similar' content without an explicit query vector.
- Recommendations API — Find similar documents by example
- Hybrid Search — Add hybrid search to grouping
- Filtering — Combine filters with group by
Вопросы для размышления
- How do you determine the optimal group_size for your RAG system? What metrics should you track?
- When is the two-collection pattern (chunks + documents) justified, and when does it just add complexity?
- How does searchGroups affect score_threshold - should you revisit it?
Связанные уроки
- qd-13-filters — Group By is often combined with filters for aggregation
- qd-12-multi-vector — Multi-vector search + group by is a common production combination
- pg-08-aggregation — SQL GROUP BY is the same aggregation semantics
- pg-11-window — Window functions rank within partitions - similar to group_by in vector space
- db-05-sql-basics