Qdrant - Vector Database

Snapshots: Backups & Recovery

A colleague accidentally ran a deletion script against 'test' - and deleted production. No backups. Recovering 50 million vectors took 18 hours. Snapshots are your insurance against scenarios like this.

  • **Pre-migration backup:** snapshot before upgrading Qdrant or changing collection parameters - rollback in minutes instead of hours
  • **Hourly S3 backups:** an automated cron job creates a snapshot every hour, uploads it to S3, and enforces a retention policy
  • **Staging refresh:** daily production snapshot → load to staging for testing with real data without any risk

Предварительные знания

  • Qdrant Cloud vs Self-Hosted

What is a snapshot and when to use it

A **snapshot** in Qdrant is a consistent point-in-time copy of a collection (or the entire instance), saved as a `.snapshot` file. Snapshots use a copy-on-write mechanism: creation takes seconds regardless of collection size. Qdrant supports two types: - **Collection snapshot** - a snapshot of a specific collection: vectors + payload + indexes - **Full snapshot** - a snapshot of the entire instance: all collections simultaneously When to use snapshots: - **Before migration** - upgrading Qdrant, changing collection parameters - **Disaster recovery** - periodic backups to S3/GCS - **Data copying** - moving a collection between instances - **Testing** - snapshot production data for staging

**File format:** `.snapshot` is a tar archive with Qdrant's internal structure. The file cannot be opened as a regular archive to inspect the data - it is designed exclusively for restoration via the Qdrant API. **Size** of a snapshot roughly equals the on-disk size of the collection data (uncompressed). Collection snapshots are stored in `/qdrant/snapshots/{collection_name}/`, full snapshots in `/qdrant/snapshots/`.

0

1

Sign In

ParameterCollection snapshotFull snapshot
CoverageOne collectionAll collections
EndpointPOST /collections/{name}/snapshotsPOST /snapshots
Creation timeSeconds (CoW)Seconds (CoW)
RecoveryWithout stoppingRequires empty instance
Use caseMigration, single collection backupFull instance backup

You are planning to upgrade Qdrant from v1.8 to v1.10 in production. What snapshot strategy should you follow?

Snapshot operations: download, upload, restore

After creating a snapshot you can **download** it, **restore** it on the same instance, or **upload** it to a new instance. Restoring a snapshot does not require stopping Qdrant - the collection is recreated from the snapshot, replacing existing data.

**CI/CD usage:** take a snapshot before deploying new code that changes payload structure or indexes. Workflow: `pre-deploy hook → createSnapshot → deploy → smoke tests → if fail: recoverFromSnapshot`. Snapshot creation takes seconds; rollback takes minutes instead of hours of manual recovery.

You want to move the 'products' collection from a dev server to staging. The staging server is accessible over HTTP but has no access to the dev server's filesystem. Which method should you use?

Production backup strategy: automation and S3

**Production backup** for Qdrant is built around three principles: 1. **Frequency** - snapshot every N hours via cron or a scheduled job 2. **External storage** - snapshot is copied to S3/GCS (not on the same server) 3. **Retention policy** - keep the last N snapshots, delete older ones Important: snapshot vs WAL-based backup. Qdrant does not provide streaming WAL replication to external storage. **Snapshot is the only built-in backup mechanism.** For RPO < 1 hour: snapshot every 30–60 minutes.

**Snapshot vs continuous backup:** Qdrant does not support WAL-shipping (like PostgreSQL). Data written between snapshots is not backed up. With RPO = 1 hour and server loss, you lose up to 1 hour of data. For critical data: snapshot every 15–30 minutes + monitor backup success (alert if the last backup is > 2 hours old).

'Snapshots are slow - it's better to use replication as a backup'

Snapshots are created in seconds (copy-on-write). Replication protects against node failure but is not a backup - if data is deleted, the replica also syncs the deletion.

Qdrant snapshots use snapshot isolation at the storage level (RocksDB). Physically creating a snapshot means creating hard links to segment files. This takes milliseconds even for a 100 GB collection. Replication protects against hardware failure of one node; snapshots protect against logical errors (accidental deletion, corrupted data, migration failure).

At 03:15 the Qdrant server crashed. The last successful backup was at 03:00. 5,000 points were written between 03:00 and 03:15. What happens when you restore from backup?

Summary

  • **Collection snapshot** - a snapshot of one collection; **Full snapshot** - the entire instance. Both are created in seconds (copy-on-write).
  • **Operations:** createSnapshot → download (fetch GET) → upload (PUT /snapshots/upload) → recover (PUT /snapshots/recover) → deleteSnapshot
  • **Recovery:** recover accepts a URL or a local path on the Qdrant server; upload accepts a file directly
  • **Production strategy:** periodic snapshots → S3 → retention policy (keep last N backups)
  • **RPO** is determined by snapshot frequency: hourly snapshot = up to 1 hour of data loss on disaster

What's next

You've learned how to take backups. The next step is working efficiently with large volumes of data using scroll and batch operations.

  • Scroll and Batch operations — For exporting data before a snapshot or verifying data after recovery
  • Distributed Qdrant — Snapshot strategy changes in a distributed cluster - snapshots per shard
  • Cloud vs Self-hosted — In Qdrant Cloud backups are managed by the platform - snapshot API behaves differently

Вопросы для размышления

  • Your collection is 200 GB. A snapshot is created in seconds but downloading it takes 10 minutes. How would you organize backups with minimal impact on production traffic?
  • How would you verify the integrity of a snapshot after uploading it to S3? What metrics would you use to monitor backup success?
  • In Qdrant Cloud (managed), is the snapshot API available? How does the backup strategy differ for managed vs self-hosted?

Связанные уроки

  • db-03-acid
Snapshots: Backups & Recovery