Qdrant - Vector Database

Result Grouping

RAG without grouping is like a Google search where all top 10 results are from the same website. searchGroups fixes this elegantly: 'best 5 documents with best 2 chunks each' - one query, the right context.

  • **RAG systems:** 4 documents × 3 chunks = 12 context fragments without duplication, with metadata
  • **News search:** one article per outlet in the top results - diversity in results
  • **Candidate search:** 1-2 jobs per company - no single company dominates the feed

Предварительные знания

  • Filtering: Filter API

The problem: many chunks from the same document

**Standard RAG pipeline:** a document is split into chunks → each chunk is indexed as a separate point. When searching, the top-10 results may contain 7 chunks from the same document. This is a poor user experience.

**Workarounds without groupBy:** 1. deduplication at the application level - but then you need limit=100 and discard 90 results 2. store whole documents - but you lose chunk-level precision. **searchGroups** solves this elegantly at the Qdrant level.

ApproachProsCons
Plain search + dedup in codeSimple implementationHigh limit → extra work, unpredictable recall
Store only full documents (no chunks)No duplicationPoor recall on long documents
searchGroupsBuilt-in deduplication, N best chunks per documentRequires a payload index on the group_by field

A RAG system returns relevant results, but the LLM sees the same document 5 times out of 10. What does this cause?

searchGroups API: group_by and group_size

**searchGroups** is a special Qdrant method that groups results by a payload field value. It returns top-N documents, each with top-M chunks.

**Payload index is required!** The `group_by` field must be indexed. Create it with: `qdrant.createPayloadIndex('chunks', { field_name: 'document_id', field_schema: 'keyword' })`. Without an index - full scan = slow.

searchGroups with group_by='document_id', group_size=2, limit=5. The collection has 3 documents, each with 10 chunks. What is the maximum number of points returned?

lookup_from: enriching from another collection

**with_lookup** is a searchGroups parameter that automatically fetches payload for each group from another collection. This is the 'chunk collection + document collection' pattern.

**Why two collections?** Chunks contain only the chunk text. Full document metadata (title, URL, author, date, tags) lives in a separate document collection. When searching, Qdrant automatically enriches groups with data from the document collection.

**'documents' collection without a real vector:** if documents is only needed for lookup (not for search), you can create it with a minimal vector (size: 1, distance: 'Cosine') and use it purely as a metadata store. This is a valid pattern.

Why store document metadata (title, URL) in a separate 'documents' collection instead of in each chunk's payload?

RAG pattern: chunk search + document grouping

**Full RAG pipeline with searchGroups** - putting it all together: flexible chunks, group by document, metadata lookup, and building context for the LLM.

**group_size recommendations:** for RAG, group_size=2-3 gives more context than a single chunk. For a UI 'one result per document' use group_size=1. For maximum recall with deduplication - group_size=1 + limit=20.

Using plain search with limit=100 and code-side deduplication to save on searchGroups

searchGroups handles deduplication at the Qdrant level - more efficient with better recall

With limit=100 you transfer 100 points over the network and filter in code. searchGroups returns only the needed groups - less traffic, correct recall. On top of that, with limit=100 the top spots are still dominated by one document - you lose diversity

For RAG: group_by='document_id', group_size=1, limit=10 vs group_size=3, limit=4. Which is better with a limited LLM context window (4096 tokens)?

Key Ideas

  • **searchGroups** solves the problem of duplicate chunks from one document in results
  • **group_by** - payload field to group by; **group_size** - best chunks per group; **limit** - number of groups
  • **with_lookup** - automatically enriches groups with data from another collection
  • **Two-collection pattern:** chunks (search) + documents (metadata) - normalization and easy updates
  • **Payload index** on the group_by field is required for performance
  • Remember the hook? One query, N documents, M chunks - that's production RAG

What's next

Grouping is working. Next step - finding 'similar' content without an explicit query vector.

  • Recommendations API — Find similar documents by example
  • Hybrid Search — Add hybrid search to grouping
  • Filtering — Combine filters with group by

Вопросы для размышления

  • How do you determine the optimal group_size for your RAG system? What metrics should you track?
  • When is the two-collection pattern (chunks + documents) justified, and when does it just add complexity?
  • How does searchGroups affect score_threshold - should you revisit it?

Связанные уроки

  • qd-13-filters — Group By is often combined with filters for aggregation
  • qd-12-multi-vector — Multi-vector search + group by is a common production combination
  • pg-08-aggregation — SQL GROUP BY is the same aggregation semantics
  • pg-11-window — Window functions rank within partitions - similar to group_by in vector space
  • db-05-sql-basics
Result Grouping

0

1

Sign In