Qdrant - Vector Database

Monitoring and Metrics

Qdrant has been running in production for 2 weeks with no issues - you feel confident everything is fine. But Grafana would have shown: memory grew 40% over the week, pending_optimizations spikes at night, P99 latency doubled on Friday evening. Monitoring transforms 'I think it's working' into 'I know it's working'.

**Plannable scaling:** a Grafana memory_usage trend gave 2 weeks of warning before OOM - enough time to add quantization without an incident
**Performance regression:** after a deploy, P99 latency jumped from 80ms to 400ms. Metrics showed: the new version changed the default hnsw_ef
**Kubernetes autoscaling:** HPA on the pending_optimizations metric adds nodes during peak indexing load

Предварительные знания

Replication and Fault Tolerance

Prometheus Metrics: What to Watch First

**Qdrant exports metrics** in Prometheus format at the `/metrics` endpoint. 50+ metrics covering collection state, optimization queues, RAM consumption, and request latency. This is the only reliable way to understand what's happening inside.

Metric	Type	Meaning	Alert threshold
app_info	gauge	Application info (always 1)	If 0 - process crashed
collections_total	gauge	Number of collections	Unexpected change
collection_vectors_count	gauge	Vectors in collection	Stagnation during active indexing
rest_response_duration_seconds_p99	histogram	P99 latency of REST requests	> 500ms
grpc_response_duration_seconds_p99	histogram	P99 latency of gRPC requests	> 200ms
pending_optimizations	gauge	Segments waiting for optimization	> 10 for extended period
wal_sequence_number	gauge	Write-Ahead Log position	Stalls (not growing during writes)

The `pending_optimizations` metric shows 15 and hasn't decreased for 30 minutes. What does this mean and what should you do?

Health Checks: /health, /readyz, /livez

**Three health endpoints** in Qdrant serve different purposes: `/health` - basic availability, `/readyz` - ready to accept traffic (combines liveness and readiness), `/livez` - node is alive. Used by Kubernetes, load balancers, and monitoring systems.

**The difference between readiness and liveness:** `livez` answers 'the node is running' - use for livenessProbe (Kubernetes restarts the pod if it hangs). `readyz` answers 'the node is ready to accept traffic' - use for readinessProbe (Kubernetes removes the pod from load balancing until it's ready). Without correct probes, a rolling update will break your service.

After a rolling update, a Qdrant node starts returning 200 on /livez but 503 on /readyz. Does Kubernetes route traffic to this node?

Grafana Dashboard: Setup and Key Alerts

**Grafana + Prometheus** is the standard monitoring stack for Qdrant. Qdrant provides an official Grafana dashboard (ID: 20650 on grafana.com). Setup takes 15 minutes.

**Key panels in the Grafana dashboard:** 1) 'Vectors indexed' - should grow during active indexing. 2) 'Search requests rate' - unusual spikes may indicate DDoS or client bugs. 3) 'Memory usage' - an upward trend warns you to add quantization or scale up. 4) 'Pending optimizations' - should trend toward 0 when idle.

Grafana shows: `pending_optimizations` = 0, `rest_response_duration_seconds_p99` = 2.3 seconds. The node is healthy per /readyz. What is most likely causing the high latency?

Key Takeaways

**GET /metrics** - Prometheus format, 50+ metrics. Key ones: pending_optimizations, rest_response_duration_seconds, process_resident_memory_bytes
**/livez** - node is alive (livenessProbe). **/readyz** - ready for traffic (readinessProbe). The distinction is critical for Kubernetes
**Grafana dashboard ID 20650** - official Qdrant dashboard. Connectable in 15 minutes via Prometheus datasource
**5 key alerts:** node down, P99 > 500ms, pending_optimizations > 20 (15m), RAM > 85%, Dead shards
**Telemetry** (`/telemetry`) - detailed per-collection and per-request statistics. Handy for debugging

What's Next

Monitoring revealed problems - now learn how to optimize Qdrant performance.

Performance Tuning — Metrics identify the bottleneck - optimization fixes it
Replication — Monitor dead shards and replica health
Quantization — process_resident_memory_bytes trending up - time to enable quantization

Вопросы для размышления

What is the difference between /health, /livez, and /readyz? Give a concrete scenario where a node returns 200 on /livez but 503 on /readyz - what is happening at that moment?
pending_optimizations = 0 but search is slow. Which other metrics would you check? Outline a latency diagnosis plan.
How do you set up monitoring for a distributed Qdrant cluster (3 nodes)? Do you need separate Prometheus targets per node? How do you aggregate cluster-wide metrics?

Связанные уроки

db-11-query-optimization