Qdrant - Vector Database

Payload Indexes

1M documents. Semantic search runs in 5ms. But add a filter for 'published articles from 2024 only' - and the query takes 300ms. Reason: full payload scan. One createPayloadIndex call - back to 5ms.

  • **E-commerce:** filters by category + price range + availability - all three fields must be indexed
  • **News aggregator:** filtering by publication date (datetime) + language (keyword) + topic (keyword) - standard setup
  • **Geo service:** searching for similar places within 5 km radius - geo index is mandatory

Предварительные знания

  • Points, Vectors, Payloads

Why payload indexes exist

A **payload index** is a data structure that lets Qdrant quickly filter points by payload field values. Without an index, every filtering query requires a **full scan** - iterating over every point in the collection.

**When to create indexes:** for any field used in regular filters. Qdrant creates indexes explicitly through the API - this means rarely queried fields stay unindexed.

**Indexes do not block requests.** Creating an index on a populated collection triggers a background build. Until it's done, filters work via full scan (slow); after - via the index (fast). Progress can be tracked via the `/collections/{name}` API.

Collection: 500k points. A filter on 'status' (values: 'active', 'archived') is used in 80% of queries. No index exists. What happens?

Index types: keyword, integer, float, geo, text, datetime

**Qdrant supports 6 payload index types**, each optimized for its data type and query pattern.

Index typeData typeSupported filtersExample fields
keywordstringmatch, is_null, is_emptycategory, status, language, tag
integerint64match, range (gte/lte/gt/lt)year, user_id, view_count, priority
floatfloat64range (gte/lte/gt/lt)price, score, latitude, longitude
geo{ lon, lat }geo_bounding_box, geo_radius, geo_polygonlocation, coordinates
textstring (full-text)match.text (full-text search)description, content, title
datetimeRFC3339 stringrange (gte/lte/gt/lt)created_at, published_at, expires_at

Task: search restaurants by cuisine (semantically, via vector), within a 3 km radius, with rating >= 4.5, currently open (is_open = true). Which indexes are needed?

Creating indexes: a practical example

**Full workflow:** creating a collection, adding points, creating indexes, filtered search. Indexes are created after adding data - Qdrant builds them in the background.

**Index only what is needed.** Each index takes additional memory (~50-200MB per 1M points) and slows down writes. A good rule: index fields used in filters in >20% of queries. Rare filters - let them do a full scan.

Creating indexes on all payload fields 'just to be safe'

Index only filter fields. Excess indexes: +memory, -write speed, zero benefit for search

Every index must be maintained on every write (upsert). With 20 indexes instead of 5 - upserts are 4x slower. Plus each index uses RAM. Rule: index only if the field appears in filters

1M points were added, then an index on 'category' was created. At that moment a query with a category filter arrives. What happens?

Key Ideas

  • **Without index = full scan:** every filter iterates all points O(N). With index - O(log N)
  • **6 types:** keyword (categories), integer/float (numbers/ranges), geo (geolocation), text (full-text), datetime (time)
  • **createPayloadIndex** is created after data, works online - doesn't block queries
  • **Index conservatively:** only fields used in filters. Each extra index = memory + slower writes
  • **Check indexes:** `getCollection` → `payload_schema`

What's next

Payload indexes speed up filtering. The next step - sparse vectors for lexical search, which works alongside semantic search.

  • Sparse Vectors: BM42 and SPLADE — Lexical search complements semantic - together this is Hybrid Search
  • Vector Quantization — Memory compression - important when many indexes take up RAM
  • HNSW: How the Index Works — HNSW + payload indexes work together for filtered vector search

Вопросы для размышления

  • How does Qdrant decide whether to use a payload index or HNSW first in filtered vector search? What happens when a filter is highly selective?
  • The collection stores JSON with nested objects (article.author.country). How is an index created on a nested field?
  • If 95% of queries filter by 'is_active: true' but only 10% of points are active - is it still worth creating a keyword index?

Связанные уроки

  • db-09-indexes-btree
Payload Indexes

0

1

Sign In