Databases
Partitioning
Cloudflare logs over 1 trillion DNS queries per month. Storing this in a single PostgreSQL table would make every retention cleanup a multi-hour DELETE operation, locking the table and generating terabytes of WAL. Instead, Cloudflare uses monthly range partitions: dropping last month's data is an instant DROP PARTITION that removes a file from disk - no row scans, no lock contention, no WAL.
- **Datadog / TimescaleDB**: trillions of metric data points stored as time-based hypertable chunks (partitions). Queries for recent metrics prune to the last few chunks, keeping p99 query latency in milliseconds.
- **Zalando**: pg_partman automates daily partition creation and retention for order event tables with billions of rows. Nightly maintenance takes seconds and requires no application changes.
- **Cloudflare DNS logs**: range partitioning by date allows instant partition drops for retention management on trillion-row datasets.
Partition Types
Partitioning divides a single large table into smaller physical segments (partitions) that share the same schema but store distinct subsets of rows. From the application's perspective, a single table is queried; the database engine transparently routes operations to the appropriate partition. Partitioning improves query performance through pruning, simplifies data lifecycle management, and enables faster bulk operations.
Cloudflare partitions its DNS query log tables by date, handling over 1 trillion DNS requests per month. Dropping a month of old data is an instant operation - DROP PARTITION - rather than a slow DELETE that would scan billions of rows and generate enormous WAL.
A table storing server logs must frequently drop data older than 90 days. Which partition strategy enables the fastest data deletion?
Range Partitioning in Practice
Range partitioning is the most common strategy for time-series and event data. Each partition covers a specific key range - typically a day, week, or month. New partitions must be created in advance; failing to create the next month's partition before data arrives causes insert failures.
Range partitions must be created in advance. If 'events_2025_03' does not exist when March data arrives and there is no DEFAULT partition, inserts fail with 'no partition of relation found for row'. Use pg_partman to automate partition creation and retention.
A range-partitioned table has no DEFAULT partition. A row with a timestamp outside all defined partition ranges is inserted. What happens?
Partition Pruning
Partition pruning is the query optimizer's ability to skip partitions that cannot contain relevant rows based on the WHERE clause. A query filtering by the partition key avoids scanning irrelevant partitions entirely. Without pruning (or when the partition key is not in the WHERE clause), all partitions are scanned - slower than a non-partitioned table.
Datadog stores trillions of metric data points using TimescaleDB (PostgreSQL extension) with time-based hypertable partitioning. Queries for recent metrics are pruned to the last few chunks, making p99 latency milliseconds instead of seconds on tables with hundreds of billions of rows.
A table is partitioned by created_at. The query WHERE user_id = 42 scans all partitions. What is the most effective fix?
pg_partman: Automated Partition Management
pg_partman is a PostgreSQL extension that automates partition creation, maintenance, and retention. It creates future partitions on a schedule, drops or detaches old partitions based on a retention policy, and handles the bookkeeping of partition sets. Without automation, DBAs manually create next-month's partition every month - a failure-prone process.
Zalando runs pg_partman on PostgreSQL clusters storing billions of order events. Partition maintenance runs nightly: it creates next month's partitions and drops data beyond the retention window. The entire operation takes seconds and requires no application changes.
pg_partman is configured with p_premake=4 and an interval of '1 month'. On January 15th, which partitions will be pre-created?
When to Use Partitioning
Partitioning is beneficial for specific workloads but adds overhead for general-purpose tables. The query planner must evaluate partition metadata for every query, even when pruning applies. Tables with fewer than 10-50 million rows rarely benefit from partitioning - indexes are more effective.
PostgreSQL cannot enforce a UNIQUE or PRIMARY KEY constraint across partitions unless the constraint includes the partition key. A global unique index on 'id' alone is not supported on partitioned tables. Plan for this early - retrofitting requires recreating the table.
A 5 million row table is partitioned by month for query performance. The DBA notices queries are now slower than before partitioning. What is the likely cause?
Key Ideas
- **Range partitioning** is optimal for time-series and log data: instant partition drops for retention, efficient range scans with pruning.
- **Partition pruning** eliminates irrelevant partitions when the WHERE clause filters on the partition key. Without the partition key in the filter, all partitions are scanned.
- **pg_partman** automates partition creation (p_premake future partitions) and retention (drop/detach old partitions). Essential for production use.
- **Partitioning makes sense** only above ~100M rows where pruning savings exceed partition metadata overhead. Indexes are more effective for smaller tables.
- **UNIQUE constraints** on partitioned tables must include the partition key - PostgreSQL cannot enforce cross-partition uniqueness on a non-partition key alone.
Related Topics
Partitioning is the step between indexing and full sharding:
- Sharding — Partitioning splits data within one node; sharding splits across multiple nodes. When a partitioned table outgrows a single node, sharding is the next step.
- Indexes and Query Optimization — Partition pruning works like an index on the partition key. For non-partition-key queries, secondary indexes within each partition are required.
- TimescaleDB — TimescaleDB extends PostgreSQL with automatic time-based partitioning (hypertables), compression, and continuous aggregates for time-series workloads.
Вопросы для размышления
- A partitioned table has a UNIQUE (id) constraint that PostgreSQL rejects. What schema change would make this constraint enforceable on a range-partitioned table?
- TimescaleDB compresses chunks (partitions) older than 7 days. What queries become slower after compression, and which remain fast?
- Cloudflare drops DNS log partitions older than 30 days instantly. What would happen to a long-running analytics query that started before the DROP PARTITION completed?