Big Data

Real-Time Analytics

LinkedIn shows 'Who viewed your profile' to 300 million users. If every request scanned 2 trillion view events, the page would take hours to load. Apache Pinot answers in 100 ms - not because there is less data, but because the architecture was designed around exactly that query pattern.

**Cloudflare** processes 6 million HTTP requests per second in ClickHouse for real-time security analytics: traffic anomalies are detected in seconds instead of hours in traditional SIEMs
**Netflix** uses Apache Druid for streaming analytics: video quality metrics (buffering ratio, startup time) aggregate per minute and directly influence CDN routing decisions in real time
**Walmart** deployed Apache Pinot for real-time inventory analytics: shelf stock updates propagate in seconds, preventing overselling during peak load

Apache Druid

Apache Druid was built at Metamarkets in 2011 with a single goal: answer analytical queries in milliseconds on data that just arrived from Kafka. Traditional OLAP stores required waiting for batch loads - hours or days. Druid solved this with a lambda-like architecture: streaming ingestion into mutable realtime segments, which are then converted into immutable historical segments. Netflix, Airbnb, and LinkedIn use Druid for real-time dashboards across billions of events.

Druid architecture: **Realtime nodes** - ingest from Kafka/Kinesis in memory, index in real time. **Historical nodes** - immutable segments on disk, optimized for scanning. **Broker** - routes queries to the right nodes, merges responses. **Coordinator** - manages balancing and retention. Data is stored in columnar format with bitmap indexes - this allows scanning billions of rows in seconds through CPU SIMD instructions.

Druid uses HyperLogLog (HLL) for approximate count distinct. Why not exact count?

ClickHouse

ClickHouse was built at Yandex in 2009 for Yandex.Metrica - a web analytics system processing trillions of events. It went open source in 2016 and quickly became the standard for analytical workloads: faster than Druid on batch queries, simpler to operate, supports real SQL without restrictions. Cloudflare processes 6 million HTTP requests per second in ClickHouse for real-time security analytics.

Key ClickHouse features: **MergeTree engine** - the primary engine; data is stored sorted by the ORDER BY key, allowing range reads without full table scans. **Materialized Views** - automatically updated aggregates on INSERT: inserting into the main table triggers aggregation in the view. **ReplicatedMergeTree** - automatic replication via ZooKeeper/ClickHouse Keeper. **Vertical scaling** - ClickHouse performs better on large single servers (more CPU/RAM) than in distributed mode.

Why does ClickHouse store data sorted by the ORDER BY key?

Apache Pinot

Apache Pinot was built at LinkedIn in 2013 for a specific use case: 'Who viewed my profile in the last 24 hours' - a query made by 200 million users simultaneously, each seeing their own data. This is user-facing analytics - analytics embedded directly in the product, not an internal dashboard. Latency requirement: P99 below 100 ms at thousands of QPS. Neither ClickHouse nor Druid is optimized for this pattern at that scale.

What sets Pinot apart from Druid and ClickHouse: **StarTree index** - specialized index for queries with many simultaneous filters (userId + campaignId + dateRange). **Upsert support** - updating records in a real-time stream (Druid is append-only by default). **Multi-tenancy** - resource isolation between tables from different tenants. **Tiered storage** - hot data in memory/SSD, cold data in S3. LinkedIn, Uber, and Walmart use Pinot for user-facing product analytics.

For which scenario is Apache Pinot preferable to ClickHouse?

Materialized Views

A Materialized View is a precomputed query result stored as a table and automatically updated when source data changes. The paradox: ClickHouse without a Materialized View takes seconds to compute 'total sales for the year' across 100 million rows. With a properly configured Materialized View aggregating by minute, the same query takes milliseconds - because it aggregates 525,600 rows instead of 100 million.

Materialized View update strategies: **Incremental** (ClickHouse AggregatingMergeTree) - new data is aggregated on INSERT and merged with existing aggregates. **Streaming** (Kafka Streams, Flink) - aggregate is continuously recomputed over the event stream. **Scheduled refresh** (classic databases) - recomputed on schedule; data goes stale between refreshes. **On-demand** - recomputed on request (simplest, slowest). For real-time systems, the first two matter.

Materialized Views are always up to date - they automatically sync with source data

Freshness of a Materialized View depends on the update strategy: incremental and streaming are near real-time; scheduled refresh can lag by minutes or hours

Different systems use different strategies. ClickHouse updates incrementally on INSERT. PostgreSQL Materialized Views only update when REFRESH MATERIALIZED VIEW is explicitly called - data can be hours old.

A ClickHouse Materialized View with AggregatingMergeTree aggregates data by minute. What happens when new rows are inserted into the main table?

Key Ideas

**Druid** - lambda architecture with realtime and historical nodes; optimal for time series with rollup at ingestion; bitmap indexes enable sub-second scans of billions of rows
**ClickHouse** - MergeTree with sparse index and incremental Materialized Views; simpler to operate than Druid; the leader for engineering analytics and batch queries
**Pinot with StarTree** - the only option for user-facing analytics at thousands of QPS with P99 under 100 ms; multi-tenancy and upsert support distinguish it from Druid and ClickHouse

Вопросы для размышления

ClickHouse outperforms Druid on batch queries but falls short on high-concurrency user-facing queries. At what QPS and latency requirement does it make sense to migrate from ClickHouse to Pinot?
A Materialized View speeds up reads but adds write overhead: every INSERT triggers aggregate recomputation. At what read/write ratio does the Materialized View stop paying for itself?
Druid uses rollup at ingestion: multiple events are aggregated into a single row. This reduces storage but loses raw data. How do you decide what granularity to retain when business reporting requirements change over time?

Связанные уроки

stat-31-eda