Qdrant - Vector Database
Filtering: Filter API
Vector search finds what's semantically close. But 'show me only published English articles created after 2024 with price under 1000' - that's a filter. Together they deliver precise and intelligent search.
- **E-commerce:** 'red sneakers' (vector) + size=42 AND brand='Nike' AND in_stock=true (filter)
- **Knowledge base:** 'how to set up nginx' (vector) + category='devops' AND language='en' (filter)
- **Geolocation:** 'Italian restaurant' (vector) + geo_radius=2km AND rating>4 AND open_now=true (filter)
Предварительные знания
Filter structure: must / should / must_not
**Filter API** in Qdrant is a way to restrict search by payload. It works like SQL WHERE, but for vector search. Filtering happens BEFORE or AFTER the vector search - Qdrant picks the strategy automatically.
| Operator | SQL equivalent | Logic |
|---|---|---|
| must | AND | All conditions must be satisfied |
| should | OR | At least one condition must be satisfied |
| must_not | NOT | No conditions may be satisfied |
| min_should | OR with minimum | At least N of M conditions must be satisfied |
**Performance:** filtering is fast ONLY when a payload index exists. Without an index, Qdrant checks every point - O(N). With an index - O(log N). Always create indexes for fields you filter by (lesson qd-08).
You need to find documents: (category='tech' AND language='en') OR (category='science' AND language='ru'). How do you express this in the Filter API?
Field conditions: Match, Range, Geo, and more
**Field conditions** are specific value checks on payload fields. Qdrant supports: exact match, match from a list, ranges, geo filters, and null checks.
| Condition | Syntax | Description |
|---|---|---|
| MatchValue | match: { value: 'en' } | Exact match against a single value |
| MatchAny | match: { any: ['en', 'ru'] } | Match any value in the list (IN) |
| MatchExcept | match: { except: ['deleted'] } | Matches none of the values in the list (NOT IN) |
| Range | range: { gte: 0, lte: 100 } | Numeric range: gt, gte, lt, lte |
| DatetimeRange | range: { gte: '2024-01-01T00:00:00Z' } | Date range (ISO 8601) |
| IsNull | is_null: { key: 'field' } | Field is null or absent |
| IsEmpty | is_empty: { key: 'tags' } | Array is empty or absent |
| GeoBoundingBox | geo_bounding_box: { top_left, bottom_right } | Rectangular geographic area |
| GeoRadius | geo_radius: { center, radius } | Circular area by radius (meters) |
You need products with BOTH tags ['sale', 'featured'] and price between 500 and 2000. How do you write the filter correctly?
Nested filters and nested objects
**Nested filters** let you filter by fields inside nested objects and arrays. Especially useful for documents with complex payload structures.
**When to use nested:** when the payload contains an array of objects and you want to check that ONE element of the array satisfies multiple conditions simultaneously. Without nested, conditions may match different array elements.
Payload: `orders: [{status: 'shipped', amount: 50}, {status: 'pending', amount: 200}]`. Need to find records with at least one order that has status='shipped' AND amount>100. What does the nested filter return?
Filter performance
**Filter performance** is critical in production. Qdrant automatically picks a filtering strategy based on selectivity (the fraction of points passing the filter) and collection size.
| Strategy | When applied | How it works |
|---|---|---|
| Pre-filtering | Filter is highly selective (< 5-10% of points) | Filter first → then ANN on the subset |
| Post-filtering | Filter is low-selectivity (> 50% of points) | ANN first → then filter the results |
| Exact (full scan) | Very small collection or no index | Iterate over all points |
**Selectivity and strategy:** if the filter is very selective (e.g., category='rare-topic' - only 0.1% of points), Qdrant pre-filters and runs ANN only on that subset. This is faster than full ANN + post-filter. Good indexes + proper selectivity = fast filtered queries.
Adding a filter on every payload field for 'flexible' search without creating indexes first
Create payload indexes only for fields that are actually used in production filters
Each payload index consumes memory (roughly 50-200MB per 1M points for a keyword field). Only index what you need. Unindexed fields used in filters = full scan = slow
You have 1 million documents. Filter `category='python'` selects ~500,000 (50%). Which strategy does Qdrant choose and why is that correct?
Key Ideas
- **must/should/must_not** - AND/OR/NOT for payload conditions
- **Field conditions:** MatchValue, MatchAny, Range, IsNull, IsEmpty, GeoBoundingBox, GeoRadius
- **Nested filter** - when you need one array element to satisfy multiple conditions at once
- **Performance:** a payload index is required for fields used in filters
- **Strategy:** Qdrant picks pre/post-filtering automatically based on selectivity
- Remember the hook? The vector finds meaning, the filter enforces structural constraints - together they're unbeatable
What's next
Filtering is working. Next step - grouping results for deduplication.
- Result Grouping — searchGroups for deduplicating chunks by document
- Hybrid Search — Add a filter to a hybrid search query
- Payload Index — More on configuring indexes for performance
Вопросы для размышления
- Which payload fields in your project need an index? How do you determine this from search requirements?
- When does a nested filter make sense instead of a plain one?
- How do you measure the impact of a filter on search performance in production?
Связанные уроки
- qd-08-payload-index — Payload index is the foundation for field filtering
- qd-11-hybrid-search — Filters are used inside hybrid search for pre-filtering
- pg-06-select — SQL WHERE clause is the same as qdrant filter, different syntax
- db-09-indexes-btree — B-tree index for filtering is analogous to payload index in Qdrant