Real-Time Backend

Time-Series data

Grafana shows pretty graphs. Behind them sit specialized databases that do what PostgreSQL cannot handle at millions of points per second.

Tesla stores telemetry from millions of cars in InfluxDB: speed, battery charge, autopilot data: more than 1000 metrics per second from every car
Prometheus was created at SoundCloud in 2012 and became the Kubernetes monitoring standard: every pod exports /metrics, Prometheus scrapes and stores time-series
Exchanges and fintech use TimescaleDB to store NYSE/NASDAQ ticks: millions of events per second with SQL analytics on top
The Grafana + InfluxDB stack is deployed at thousands of companies for infrastructure monitoring: CPU, latency, error rate with retention policies and downsampling

What time-series data is

Time-series is a sequence of values bound to timestamps. Each record answers the question "what happened and when". This is not just a table with a `created_at` column. In a time-series DB, time is the primary dimension that drives storage, indexing, and compression.

Hallmarks of time-series data: records are almost never updated (append-only), old data is downsampled or deleted (retention policy), queries are always time-range aggregations (avg, sum, percentile). Classical relational databases handle this pattern poorly: a timestamp index degrades, and storing billions of rows uncompressed is wasteful.

**Infrastructure metrics**: CPU, memory, service latency (Grafana + InfluxDB/Prometheus)
**IoT telemetry**: Tesla collects more than 1000 metrics from every car every second into InfluxDB
**Financial ticks**: NYSE/NASDAQ quotes: millions of events per second, stored for years
**APM traces**: span duration, error rate, throughput for every endpoint

How does a time-series DB fundamentally differ from a relational DB when storing metrics?

InfluxDB: the metrics store

InfluxDB is a specialized time-series DB built from scratch for metrics and events. Data is organized into measurements (table equivalents), tags (indexed strings for filtering), and fields (numeric values). Storage uses TSM (Time-Structured Merge Tree), a variant of LSM trees optimized for time data with aggressive compression.

Tesla uses InfluxDB to collect telemetry from its cars: speed, battery charge, component temperatures, autopilot data: more than 1000 metrics per car every second. With millions of cars that is petabytes of data. InfluxDB handles it through built-in downsampling (Continuous Queries in v1, Tasks in v2) and retention policies.

**Cardinality explosion** is the main InfluxDB trap. Tags are indexed, and the number of unique tag combinations is called cardinality. Add userId as a tag in a metric with a million users and you get a million index entries. That kills performance and memory. Rule: only put filterable fields in tags (host, env, region). User identifiers belong in fields.

A team added userId as a tag in an InfluxDB request metric. What happens with 1M active users?

TimescaleDB and Prometheus

TimescaleDB is a PostgreSQL extension for time-series data. The table is automatically partitioned into time chunks (hypertable), which enables partition pruning: instead of scanning the full table, only the chunks for the required time range are scanned. Full SQL, JOINs with regular tables, pg_extensions: everything works as in plain PostgreSQL.

Prometheus is a pull-based monitoring system created at SoundCloud in 2012 (open-sourced in 2015). The server scrapes /metrics endpoints every N seconds and stores data in its own TSDB time-series format. It keeps data locally and is not designed for long-term retention, typically 15 days. For long-term storage, data is exported to InfluxDB, Thanos, or Cortex.

**TimescaleDB**: pick when you need SQL, JOINs with business tables, or you already have PostgreSQL infrastructure
**InfluxDB**: pick for pure metrics without JOINs, very high write throughput, and built-in downsampling
**Prometheus**: the standard for Kubernetes/microservices monitoring, not for long-term storage
**Financial data**: TimescaleDB is popular for storing market ticks: JOINs with instrument tables, SQL analytics

A team wants to store service metrics for 2 years and JOIN them with a users table for business analytics. What to pick?

Downsampling and retention

Downsampling replaces raw data with period aggregates. Grafana + InfluxDB for infrastructure monitoring typically stores: raw data for 7 days (1-second resolution), 5-minute aggregates for 30 days, hourly aggregates for 1 year. This lets you see yesterday's incident in detail (raw) and quarterly trends (hourly avg) at a reasonable storage footprint.

Retention policies delete data older than a configured age automatically. In InfluxDB v2 that is a Task with bucket retention. In TimescaleDB it is `add_retention_policy('metrics', INTERVAL '90 days')`. In Prometheus it is `--storage.tsdb.retention.time=15d` at startup. Without retention, a time-series store grows forever, and disk fills up within weeks at high throughput.

**Storage math:** 1 metric at 1-second resolution = 86,400 points/day. 1,000 metrics = 86.4M points/day. At 8 bytes per point that is ~690 MB/day uncompressed. TimescaleDB delivers 90-95% compression on old chunks, so ~35-70 MB/day. A year totals 12-25 GB instead of 250 GB.

A time-series DB is only needed for infrastructure metrics

The time-series pattern fits anywhere data is bound to time: IoT, financial ticks, user events, logs with aggregation

Any data with a "what happened and when" pattern benefits from time-series optimizations. TimescaleDB stores market ticks; InfluxDB powers EV telemetry. The boundary is not the industry but the access pattern: append-only, time-range queries, aggregation.

A system stores 1-second metrics with no retention policy. In 3 months the disk fills up. What is the right approach?

Key takeaways

Time-series data is append-only with time as the primary dimension; specialized DBs (InfluxDB, TimescaleDB) store it 10-100x more efficiently than PostgreSQL
Cardinality explosion in InfluxDB: tags are indexed, so high-cardinality fields (userId, requestId) must be fields, not tags
Downsampling + retention = managed growth: raw 7 days for diagnostics, aggregates 90 days for trends, auto-purge via retention policy
TimescaleDB when you need SQL and JOINs with business data; InfluxDB for pure metrics with high throughput; Prometheus for scrape-based microservice monitoring

Вопросы для размышления

Which data in your project is time-series in nature? Could it benefit from a specialized store?
If you had to add monitoring for 500 microservices with 1-year retention, would you pick Prometheus, InfluxDB, or TimescaleDB and why?
How does downsampling shift the trade-off between data precision and storage cost? Where does the reasonable line sit?

Связанные уроки

db-27-timeseries