Qdrant - Vector Database
Distributed Mode and Sharding
Your collection grew to 50M vectors - a single server can't keep up. Distributed mode lets you split data across nodes and scale horizontally. But the critical question isn't technical - did you configure shards correctly? One wrong choice at collection creation time and you'll need to rebuild everything from scratch.
- **Document search over 100M documents:** 10 nodes × 10 shards, search runs in parallel - latency matches a single-node setup with 10M docs
- **SaaS with 500 customers:** Custom sharding by tenant_id - data isolation, search touches only the relevant tenant's shards
- **Hot swap of failing hardware:** move_shard migrates data from a dying disk to a new node with zero downtime
Предварительные знания
Sharding: How Qdrant Distributes Data
**Distributed mode** allows Qdrant to operate as a multi-node cluster. Each collection is split into **shards** - independent data partitions. Each shard lives on one node (or is replicated across several).
**Shard count** is set when the collection is created and cannot be changed without a full reshard. Default is 1 shard. Rule of thumb: create as many shards as you have nodes, or a multiple of that number.
**Choosing shard count:** `shard_number = number_of_nodes` is a simple starting rule. If you plan to scale to 6 nodes - create 6 shards upfront. You cannot change the shard count without recreating the collection.
You have a 4-node cluster. The collection was created with shard_number: 2. How will the shards be distributed?
Custom Sharding for Multi-Tenancy
**Custom Sharding (shard_key)** lets you explicitly control which shard stores which points. The primary use case is **multi-tenancy**: isolating different tenants' data to specific shards for performance and operational simplicity.
**Custom Sharding vs Payload filtering for multi-tenancy:** A payload filter (`must: [{key: 'tenant_id', match: {value: 'alpha'}}]`) works but filters AFTER loading candidates from all shards - both alpha and beta shards are read. Custom sharding isolates at the storage level - beta shards are never touched. With 100+ tenants, custom sharding is dramatically more efficient.
A SaaS platform has 50 customers, each with ~100k documents. 5M documents total. How should multi-tenancy be organized?
Cluster Management: Nodes, Shard Rebalancing
**Cluster management** includes: adding nodes, moving shards between nodes, gracefully decommissioning a node. Qdrant provides a REST API for all operations - you can automate this via Kubernetes operators or custom scripts.
**You cannot decrease shard_number** for an existing collection. To merge shards you must create a new collection with a lower shard_number and migrate data via snapshot. Plan the shard count generously when creating the collection.
You added a 4th node to the cluster (was 3 nodes, 3 shards). How do you achieve even shard distribution?
Key Takeaways
- **Shards** are collection partitions distributed across nodes. Default: consistent hashing by point_id
- **shard_number** is set at collection creation and cannot change without recreation. Plan ahead
- **Custom Sharding** (shard_key) - explicit control: ideal for multi-tenancy, storage-level isolation
- **move_shard** - live shard migration between nodes with no downtime. Methods: stream_records (live), snapshot
- **Raft consensus** manages cluster metadata. Node addition requires a bootstrap URI
What's Next
Data is distributed across shards - now ensure fault tolerance through replication.
- Replication and Raft — Shards without replication are a single point of failure. Replication factor protects against node loss
- Cloud vs Self-Hosted — Distributed mode is self-hosted. Cloud manages shards for you automatically
- Monitoring — In distributed mode you need to monitor each node and shard health
Вопросы для размышления
- A collection has shard_number: 3 and a 6-node cluster. Each shard is replicated twice (replication_factor: 2). How many physical copies of each shard exist? On how many nodes can the collection function if 2 nodes fail?
- Custom sharding vs separate collections for multi-tenancy: when is each approach preferable? What are the operational trade-offs?
- Why can't shard_number be changed without recreating the collection? What technically prevents Qdrant from doing automatic rebalancing the way Elasticsearch does?