Qdrant - Vector Database

Filtering: Filter API

Vector search finds what's semantically close. But 'show me only published English articles created after 2024 with price under 1000' - that's a filter. Together they deliver precise and intelligent search.

  • **E-commerce:** 'red sneakers' (vector) + size=42 AND brand='Nike' AND in_stock=true (filter)
  • **Knowledge base:** 'how to set up nginx' (vector) + category='devops' AND language='en' (filter)
  • **Geolocation:** 'Italian restaurant' (vector) + geo_radius=2km AND rating>4 AND open_now=true (filter)

Предварительные знания

  • Payload Indexes

Filter structure: must / should / must_not

**Filter API** in Qdrant is a way to restrict search by payload. It works like SQL WHERE, but for vector search. Filtering happens BEFORE or AFTER the vector search - Qdrant picks the strategy automatically.

OperatorSQL equivalentLogic
mustANDAll conditions must be satisfied
shouldORAt least one condition must be satisfied
must_notNOTNo conditions may be satisfied
min_shouldOR with minimumAt least N of M conditions must be satisfied

**Performance:** filtering is fast ONLY when a payload index exists. Without an index, Qdrant checks every point - O(N). With an index - O(log N). Always create indexes for fields you filter by (lesson qd-08).

0

1

Sign In

You need to find documents: (category='tech' AND language='en') OR (category='science' AND language='ru'). How do you express this in the Filter API?

Field conditions: Match, Range, Geo, and more

**Field conditions** are specific value checks on payload fields. Qdrant supports: exact match, match from a list, ranges, geo filters, and null checks.

ConditionSyntaxDescription
MatchValuematch: { value: 'en' }Exact match against a single value
MatchAnymatch: { any: ['en', 'ru'] }Match any value in the list (IN)
MatchExceptmatch: { except: ['deleted'] }Matches none of the values in the list (NOT IN)
Rangerange: { gte: 0, lte: 100 }Numeric range: gt, gte, lt, lte
DatetimeRangerange: { gte: '2024-01-01T00:00:00Z' }Date range (ISO 8601)
IsNullis_null: { key: 'field' }Field is null or absent
IsEmptyis_empty: { key: 'tags' }Array is empty or absent
GeoBoundingBoxgeo_bounding_box: { top_left, bottom_right }Rectangular geographic area
GeoRadiusgeo_radius: { center, radius }Circular area by radius (meters)

You need products with BOTH tags ['sale', 'featured'] and price between 500 and 2000. How do you write the filter correctly?

Nested filters and nested objects

**Nested filters** let you filter by fields inside nested objects and arrays. Especially useful for documents with complex payload structures.

**When to use nested:** when the payload contains an array of objects and you want to check that ONE element of the array satisfies multiple conditions simultaneously. Without nested, conditions may match different array elements.

Payload: `orders: [{status: 'shipped', amount: 50}, {status: 'pending', amount: 200}]`. Need to find records with at least one order that has status='shipped' AND amount>100. What does the nested filter return?

Filter performance

**Filter performance** is critical in production. Qdrant automatically picks a filtering strategy based on selectivity (the fraction of points passing the filter) and collection size.

StrategyWhen appliedHow it works
Pre-filteringFilter is highly selective (< 5-10% of points)Filter first → then ANN on the subset
Post-filteringFilter is low-selectivity (> 50% of points)ANN first → then filter the results
Exact (full scan)Very small collection or no indexIterate over all points

**Selectivity and strategy:** if the filter is very selective (e.g., category='rare-topic' - only 0.1% of points), Qdrant pre-filters and runs ANN only on that subset. This is faster than full ANN + post-filter. Good indexes + proper selectivity = fast filtered queries.

Adding a filter on every payload field for 'flexible' search without creating indexes first

Create payload indexes only for fields that are actually used in production filters

Each payload index consumes memory (roughly 50-200MB per 1M points for a keyword field). Only index what you need. Unindexed fields used in filters = full scan = slow

You have 1 million documents. Filter `category='python'` selects ~500,000 (50%). Which strategy does Qdrant choose and why is that correct?

Key Ideas

  • **must/should/must_not** - AND/OR/NOT for payload conditions
  • **Field conditions:** MatchValue, MatchAny, Range, IsNull, IsEmpty, GeoBoundingBox, GeoRadius
  • **Nested filter** - when you need one array element to satisfy multiple conditions at once
  • **Performance:** a payload index is required for fields used in filters
  • **Strategy:** Qdrant picks pre/post-filtering automatically based on selectivity
  • Remember the hook? The vector finds meaning, the filter enforces structural constraints - together they're unbeatable

What's next

Filtering is working. Next step - grouping results for deduplication.

  • Result Grouping — searchGroups for deduplicating chunks by document
  • Hybrid Search — Add a filter to a hybrid search query
  • Payload Index — More on configuring indexes for performance

Вопросы для размышления

  • Which payload fields in your project need an index? How do you determine this from search requirements?
  • When does a nested filter make sense instead of a plain one?
  • How do you measure the impact of a filter on search performance in production?

Связанные уроки

  • qd-08-payload-index — Payload index is the foundation for field filtering
  • qd-11-hybrid-search — Filters are used inside hybrid search for pre-filtering
  • pg-06-select — SQL WHERE clause is the same as qdrant filter, different syntax
  • db-09-indexes-btree — B-tree index for filtering is analogous to payload index in Qdrant
Filtering: Filter API