Databases

Cassandra: Wide-Column for Petabyte-Scale Data

Netflix stores 36 petabytes of viewing data in Cassandra. When 200 million users open the app simultaneously, every profile request hits Cassandra. It handles this without a master, without transactions, and without a DBA on call.

**Apple iCloud**: 75,000+ Cassandra nodes storing data from 782 million devices.
**Discord**: 4 trillion messages in Cassandra; migrated to ScyllaDB in 2023 for lower latency.
**Uber**: trip history and GPS tracks in Cassandra - billions of writes per day.

Partition Key and Clustering Key

Cassandra's PRIMARY KEY consists of two parts: the partition key determines which node stores the data (consistent hashing), and the clustering key defines the sort order within a partition. This distinction is fundamental to all Cassandra data modeling.

Hot partition is the primary operational problem in Cassandra. If the partition key has low cardinality (e.g., status: 'active'/'inactive'), a few partitions receive all traffic. Best practice: partition key cardinality should be close to the number of expected rows divided by the target partition size (100-500 MB max).

A message table has PRIMARY KEY (chat_id, message_id). What happens when a chat accumulates a million messages?

Gossip Protocol and Ring Architecture

Cassandra is a peer-to-peer cluster with no master. Nodes form a ring; each owns a range of hash space tokens. Data is replicated to N consecutive nodes in the ring (replication factor). Gossip protocol: each node exchanges state with 1-3 random neighbors every second, propagating cluster topology in O(log N) rounds.

Consistent hashing in Cassandra: when a node is added, only 1/N of the data migrates to the new node. This is the key advantage over SQL sharding - horizontal scaling without reconfiguring the entire cluster.

A Cassandra cluster of 6 nodes with replication factor = 3. How many nodes can fail without losing data availability?

Consistency Levels

Cassandra sacrifices strict consistency for availability (AP in CAP). But the consistency level is configurable per query: ONE (fastest, eventually consistent), QUORUM (majority of replicas, balanced), ALL (all replicas, strongest consistency, highest latency). The formula: write_CL + read_CL > replication_factor guarantees strong consistency.

Cassandra LWT (Lightweight Transactions) add compare-and-set operations: INSERT ... IF NOT EXISTS and UPDATE ... IF condition. They use Paxos consensus - 4x slower than regular writes but guarantee that only one concurrent writer wins. Used for idempotency keys and unique constraints.

An application writes a payment with CONSISTENCY ONE and immediately reads it with CONSISTENCY ONE. Why can the result be 'not found'?

Compaction Strategies

Cassandra uses LSM tree: writes go first to the memtable (memory), then are flushed to disk as immutable SSTables. Compaction merges SSTables, removes deleted data (tombstones), and applies TTL expiration. The choice of compaction strategy significantly affects performance.

Discord uses Cassandra with TWCS for message storage with TTL. When a time window expires, the entire SSTable is deleted in one operation - no tombstone processing needed. This is why TWCS achieves near-zero compaction overhead for time-bounded data.

A metrics table: data is written continuously, retained for 30 days, then deleted. Which compaction strategy?

Data Modeling in Cassandra

The primary rule of Cassandra data modeling: the schema is defined by queries, not by entities. Unlike SQL (normalize first, query later), Cassandra requires knowing the access patterns upfront and designing a table for each one.

Cassandra does not support JOINs or subqueries. If a query requires a JOIN, it must be implemented as separate tables or in application code. This is a feature, not a limitation: it forces explicit data access patterns and eliminates N+1 query problems.

Cassandra is just a scalable MySQL

Cassandra is a structurally different system: no JOINs, no transactions, no flexible queries. But it achieves 1M+ writes/sec per node and scales linearly by adding nodes.

The confusion arises because Cassandra uses SQL-like CQL syntax. But the data model, consistency model, and operational characteristics are completely different from relational databases.

In Cassandra, how should orders for a user in the last 7 days be retrieved?

Key Ideas

**Partition Key**: determines storage node; must have high cardinality. Partition size limit: 100,000 rows or 100 MB.
**Consistency Levels**: ONE (fastest, eventual), QUORUM (balanced), ALL (strongest). QUORUM+QUORUM guarantees strong consistency.
**Query-first modeling**: one table per query pattern. Denormalization is expected and correct.
**TWCS**: the right compaction strategy for time-series data with TTL - deletes entire SSTables instead of individual tombstones.

Вопросы для размышления

A chat application has 10,000 users in one group chat sending 100 messages/sec. What Cassandra data model problems does this create, and how would you solve them?
When would you choose Cassandra over TimescaleDB for time-series data?
Cassandra guarantees eventual consistency by default. Name three application-level techniques to handle read-after-write inconsistency.

Связанные уроки

dist-15-consistent-hashing