Databases
Cassandra: Wide-Column for Petabyte-Scale Data
Netflix stores 36 petabytes of viewing data in Cassandra. When 200 million users open the app simultaneously, every profile request hits Cassandra. It handles this without a master, without transactions, and without a DBA on call.
- **Apple iCloud**: 75,000+ Cassandra nodes storing data from 782 million devices.
- **Discord**: 4 trillion messages in Cassandra; migrated to ScyllaDB in 2023 for lower latency.
- **Uber**: trip history and GPS tracks in Cassandra - billions of writes per day.
Partition Key and Clustering Key
Cassandra's PRIMARY KEY consists of two parts: the partition key determines which node stores the data (consistent hashing), and the clustering key defines the sort order within a partition. This distinction is fundamental to all Cassandra data modeling.
Hot partition is the primary operational problem in Cassandra. If the partition key has low cardinality (e.g., status: 'active'/'inactive'), a few partitions receive all traffic. Best practice: partition key cardinality should be close to the number of expected rows divided by the target partition size (100-500 MB max).
A message table has PRIMARY KEY (chat_id, message_id). What happens when a chat accumulates a million messages?
Gossip Protocol and Ring Architecture
Cassandra is a peer-to-peer cluster with no master. Nodes form a ring; each owns a range of hash space tokens. Data is replicated to N consecutive nodes in the ring (replication factor). Gossip protocol: each node exchanges state with 1-3 random neighbors every second, propagating cluster topology in O(log N) rounds.
Consistent hashing in Cassandra: when a node is added, only 1/N of the data migrates to the new node. This is the key advantage over SQL sharding - horizontal scaling without reconfiguring the entire cluster.
A Cassandra cluster of 6 nodes with replication factor = 3. How many nodes can fail without losing data availability?
Consistency Levels
Cassandra sacrifices strict consistency for availability (AP in CAP). But the consistency level is configurable per query: ONE (fastest, eventually consistent), QUORUM (majority of replicas, balanced), ALL (all replicas, strongest consistency, highest latency). The formula: write_CL + read_CL > replication_factor guarantees strong consistency.
Cassandra LWT (Lightweight Transactions) add compare-and-set operations: INSERT ... IF NOT EXISTS and UPDATE ... IF condition. They use Paxos consensus - 4x slower than regular writes but guarantee that only one concurrent writer wins. Used for idempotency keys and unique constraints.
An application writes a payment with CONSISTENCY ONE and immediately reads it with CONSISTENCY ONE. Why can the result be 'not found'?
Compaction Strategies
Cassandra uses LSM tree: writes go first to the memtable (memory), then are flushed to disk as immutable SSTables. Compaction merges SSTables, removes deleted data (tombstones), and applies TTL expiration. The choice of compaction strategy significantly affects performance.
Discord uses Cassandra with TWCS for message storage with TTL. When a time window expires, the entire SSTable is deleted in one operation - no tombstone processing needed. This is why TWCS achieves near-zero compaction overhead for time-bounded data.
A metrics table: data is written continuously, retained for 30 days, then deleted. Which compaction strategy?
Data Modeling in Cassandra
The primary rule of Cassandra data modeling: the schema is defined by queries, not by entities. Unlike SQL (normalize first, query later), Cassandra requires knowing the access patterns upfront and designing a table for each one.
Cassandra does not support JOINs or subqueries. If a query requires a JOIN, it must be implemented as separate tables or in application code. This is a feature, not a limitation: it forces explicit data access patterns and eliminates N+1 query problems.
Cassandra is just a scalable MySQL
Cassandra is a structurally different system: no JOINs, no transactions, no flexible queries. But it achieves 1M+ writes/sec per node and scales linearly by adding nodes.
The confusion arises because Cassandra uses SQL-like CQL syntax. But the data model, consistency model, and operational characteristics are completely different from relational databases.
In Cassandra, how should orders for a user in the last 7 days be retrieved?
Key Ideas
- **Partition Key**: determines storage node; must have high cardinality. Partition size limit: 100,000 rows or 100 MB.
- **Consistency Levels**: ONE (fastest, eventual), QUORUM (balanced), ALL (strongest). QUORUM+QUORUM guarantees strong consistency.
- **Query-first modeling**: one table per query pattern. Denormalization is expected and correct.
- **TWCS**: the right compaction strategy for time-series data with TTL - deletes entire SSTables instead of individual tombstones.
Related Topics
Cassandra sits at the intersection of distributed systems and data engineering.
- LSM Tree — Cassandra's storage engine is based on LSM tree - the same structure used by RocksDB and LevelDB.
- CAP Theorem — Cassandra is an AP system by default - available and partition-tolerant, sacrificing strict consistency.
- Time-Series Databases — Cassandra with TWCS is a production-grade time-series store for IoT and metrics.
Вопросы для размышления
- A chat application has 10,000 users in one group chat sending 100 messages/sec. What Cassandra data model problems does this create, and how would you solve them?
- When would you choose Cassandra over TimescaleDB for time-series data?
- Cassandra guarantees eventual consistency by default. Name three application-level techniques to handle read-after-write inconsistency.