Qdrant - Vector Database
Monitoring and Metrics
Qdrant has been running in production for 2 weeks with no issues - you feel confident everything is fine. But Grafana would have shown: memory grew 40% over the week, pending_optimizations spikes at night, P99 latency doubled on Friday evening. Monitoring transforms 'I think it's working' into 'I know it's working'.
- **Plannable scaling:** a Grafana memory_usage trend gave 2 weeks of warning before OOM - enough time to add quantization without an incident
- **Performance regression:** after a deploy, P99 latency jumped from 80ms to 400ms. Metrics showed: the new version changed the default hnsw_ef
- **Kubernetes autoscaling:** HPA on the pending_optimizations metric adds nodes during peak indexing load
Предварительные знания
Prometheus Metrics: What to Watch First
**Qdrant exports metrics** in Prometheus format at the `/metrics` endpoint. 50+ metrics covering collection state, optimization queues, RAM consumption, and request latency. This is the only reliable way to understand what's happening inside.
| Metric | Type | Meaning | Alert threshold |
|---|---|---|---|
| app_info | gauge | Application info (always 1) | If 0 - process crashed |
| collections_total | gauge | Number of collections | Unexpected change |
| collection_vectors_count | gauge | Vectors in collection | Stagnation during active indexing |
| rest_response_duration_seconds_p99 | histogram | P99 latency of REST requests | > 500ms |
| grpc_response_duration_seconds_p99 | histogram | P99 latency of gRPC requests | > 200ms |
| pending_optimizations | gauge | Segments waiting for optimization | > 10 for extended period |
| wal_sequence_number | gauge | Write-Ahead Log position | Stalls (not growing during writes) |
The `pending_optimizations` metric shows 15 and hasn't decreased for 30 minutes. What does this mean and what should you do?
Health Checks: /health, /readyz, /livez
**Three health endpoints** in Qdrant serve different purposes: `/health` - basic availability, `/readyz` - ready to accept traffic (combines liveness and readiness), `/livez` - node is alive. Used by Kubernetes, load balancers, and monitoring systems.
**The difference between readiness and liveness:** `livez` answers 'the node is running' - use for livenessProbe (Kubernetes restarts the pod if it hangs). `readyz` answers 'the node is ready to accept traffic' - use for readinessProbe (Kubernetes removes the pod from load balancing until it's ready). Without correct probes, a rolling update will break your service.
After a rolling update, a Qdrant node starts returning 200 on /livez but 503 on /readyz. Does Kubernetes route traffic to this node?
Grafana Dashboard: Setup and Key Alerts
**Grafana + Prometheus** is the standard monitoring stack for Qdrant. Qdrant provides an official Grafana dashboard (ID: 20650 on grafana.com). Setup takes 15 minutes.
**Key panels in the Grafana dashboard:** 1) 'Vectors indexed' - should grow during active indexing. 2) 'Search requests rate' - unusual spikes may indicate DDoS or client bugs. 3) 'Memory usage' - an upward trend warns you to add quantization or scale up. 4) 'Pending optimizations' - should trend toward 0 when idle.
Grafana shows: `pending_optimizations` = 0, `rest_response_duration_seconds_p99` = 2.3 seconds. The node is healthy per /readyz. What is most likely causing the high latency?
Key Takeaways
- **GET /metrics** - Prometheus format, 50+ metrics. Key ones: pending_optimizations, rest_response_duration_seconds, process_resident_memory_bytes
- **/livez** - node is alive (livenessProbe). **/readyz** - ready for traffic (readinessProbe). The distinction is critical for Kubernetes
- **Grafana dashboard ID 20650** - official Qdrant dashboard. Connectable in 15 minutes via Prometheus datasource
- **5 key alerts:** node down, P99 > 500ms, pending_optimizations > 20 (15m), RAM > 85%, Dead shards
- **Telemetry** (`/telemetry`) - detailed per-collection and per-request statistics. Handy for debugging
What's Next
Monitoring revealed problems - now learn how to optimize Qdrant performance.
- Performance Tuning — Metrics identify the bottleneck - optimization fixes it
- Replication — Monitor dead shards and replica health
- Quantization — process_resident_memory_bytes trending up - time to enable quantization
Вопросы для размышления
- What is the difference between /health, /livez, and /readyz? Give a concrete scenario where a node returns 200 on /livez but 503 on /readyz - what is happening at that moment?
- pending_optimizations = 0 but search is slow. Which other metrics would you check? Outline a latency diagnosis plan.
- How do you set up monitoring for a distributed Qdrant cluster (3 nodes)? Do you need separate Prometheus targets per node? How do you aggregate cluster-wide metrics?