Qdrant - Vector Database

Payload Indexes

1M documents. Semantic search runs in 5ms. But add a filter for 'published articles from 2024 only' - and the query takes 300ms. Reason: full payload scan. One createPayloadIndex call - back to 5ms.

**E-commerce:** filters by category + price range + availability - all three fields must be indexed
**News aggregator:** filtering by publication date (datetime) + language (keyword) + topic (keyword) - standard setup
**Geo service:** searching for similar places within 5 km radius - geo index is mandatory

Предварительные знания

Points, Vectors, Payloads

Why payload indexes exist

A **payload index** is a data structure that lets Qdrant quickly filter points by payload field values. Without an index, every filtering query requires a **full scan** - iterating over every point in the collection.

**When to create indexes:** for any field used in regular filters. Qdrant creates indexes explicitly through the API - this means rarely queried fields stay unindexed.

**Indexes do not block requests.** Creating an index on a populated collection triggers a background build. Until it's done, filters work via full scan (slow); after - via the index (fast). Progress can be tracked via the `/collections/{name}` API.

Collection: 500k points. A filter on 'status' (values: 'active', 'archived') is used in 80% of queries. No index exists. What happens?

Index types: keyword, integer, float, geo, text, datetime

**Qdrant supports 6 payload index types**, each optimized for its data type and query pattern.

Index type	Data type	Supported filters	Example fields
keyword	string	match, is_null, is_empty	category, status, language, tag
integer	int64	match, range (gte/lte/gt/lt)	year, user_id, view_count, priority
float	float64	range (gte/lte/gt/lt)	price, score, latitude, longitude
geo	{ lon, lat }	geo_bounding_box, geo_radius, geo_polygon	location, coordinates
text	string (full-text)	match.text (full-text search)	description, content, title
datetime	RFC3339 string	range (gte/lte/gt/lt)	created_at, published_at, expires_at

Task: search restaurants by cuisine (semantically, via vector), within a 3 km radius, with rating >= 4.5, currently open (is_open = true). Which indexes are needed?

Creating indexes: a practical example

**Full workflow:** creating a collection, adding points, creating indexes, filtered search. Indexes are created after adding data - Qdrant builds them in the background.

**Index only what is needed.** Each index takes additional memory (~50-200MB per 1M points) and slows down writes. A good rule: index fields used in filters in >20% of queries. Rare filters - let them do a full scan.

Creating indexes on all payload fields 'just to be safe'

Index only filter fields. Excess indexes: +memory, -write speed, zero benefit for search

Every index must be maintained on every write (upsert). With 20 indexes instead of 5 - upserts are 4x slower. Plus each index uses RAM. Rule: index only if the field appears in filters

1M points were added, then an index on 'category' was created. At that moment a query with a category filter arrives. What happens?

Key Ideas

**Without index = full scan:** every filter iterates all points O(N). With index - O(log N)
**6 types:** keyword (categories), integer/float (numbers/ranges), geo (geolocation), text (full-text), datetime (time)
**createPayloadIndex** is created after data, works online - doesn't block queries
**Index conservatively:** only fields used in filters. Each extra index = memory + slower writes
**Check indexes:** `getCollection` → `payload_schema`

What's next

Payload indexes speed up filtering. The next step - sparse vectors for lexical search, which works alongside semantic search.

Sparse Vectors: BM42 and SPLADE — Lexical search complements semantic - together this is Hybrid Search
Vector Quantization — Memory compression - important when many indexes take up RAM
HNSW: How the Index Works — HNSW + payload indexes work together for filtered vector search

Вопросы для размышления

How does Qdrant decide whether to use a payload index or HNSW first in filtered vector search? What happens when a filter is highly selective?
The collection stores JSON with nested objects (article.author.country). How is an index created on a nested field?
If 95% of queries filter by 'is_active: true' but only 10% of points are active - is it still worth creating a keyword index?

Связанные уроки

db-09-indexes-btree

Why payload indexes exist

**When to create indexes:** for any field used in regular filters. Qdrant creates indexes explicitly through the API - this means rarely queried fields stay unindexed.

Collection: 500k points. A filter on 'status' (values: 'active', 'archived') is used in 80% of queries. No index exists. What happens?

Index types: keyword, integer, float, geo, text, datetime

**Qdrant supports 6 payload index types**, each optimized for its data type and query pattern.

Index type

Data type

Supported filters

Example fields

keyword

string

match, is_null, is_empty

category, status, language, tag

integer

int64

match, range (gte/lte/gt/lt)

year, user_id, view_count, priority

float

float64

range (gte/lte/gt/lt)

price, score, latitude, longitude

geo

{ lon, lat }

geo_bounding_box, geo_radius, geo_polygon

location, coordinates

text

string (full-text)

match.text (full-text search)

description, content, title

datetime

RFC3339 string

range (gte/lte/gt/lt)

created_at, published_at, expires_at

Task: search restaurants by cuisine (semantically, via vector), within a 3 km radius, with rating >= 4.5, currently open (is_open = true). Which indexes are needed?

Creating indexes: a practical example

**Full workflow:** creating a collection, adding points, creating indexes, filtered search. Indexes are created after adding data - Qdrant builds them in the background.

Creating indexes on all payload fields 'just to be safe'

Index only filter fields. Excess indexes: +memory, -write speed, zero benefit for search

Every index must be maintained on every write (upsert). With 20 indexes instead of 5 - upserts are 4x slower. Plus each index uses RAM. Rule: index only if the field appears in filters

1M points were added, then an index on 'category' was created. At that moment a query with a category filter arrives. What happens?

Key Ideas

**Without index = full scan:** every filter iterates all points O(N). With index - O(log N)

**6 types:** keyword (categories), integer/float (numbers/ranges), geo (geolocation), text (full-text), datetime (time)

**createPayloadIndex** is created after data, works online - doesn't block queries

**Index conservatively:** only fields used in filters. Each extra index = memory + slower writes

**Check indexes:** `getCollection` → `payload_schema`