Big Data
Kafka: Architecture and Patterns
LinkedIn published Kafka in 2011 with a clear mission: move the activity of 175 million users through dozens of services without losses or duplicates. Today Kafka processes more than 7 trillion messages per day at LinkedIn alone. At Uber, Netflix, and Airbnb, Kafka became the nervous system of the entire real-time infrastructure. Behind that scale sit four straightforward concepts: topics, partitions, consumer groups, and delivery semantics.
- **LinkedIn** runs Kafka for the activity stream: 7+ trillion messages per day, a single event bus connecting 1,000+ microservices
- **Uber** builds its real-time demand map via Kafka: GPS events partitioned by geohash and processed with sub-minute latency
- **Confluent Cloud** manages millions of topics for enterprise customers: Schema Registry plus Kafka Connect turns Kafka into a central data integration hub
Topics and the Commit Log
In 2010, LinkedIn faced a specific problem: 300 services needed to receive user-activity events. Direct HTTP calls between services created n-squared connections, and the system collapsed under any spike. The solution was a **distributed event log**. Producers write to one end; consumers read independently at their own pace. That design became Kafka.
A **topic** is a named stream of messages - analogous to a table in a database. Messages inside a topic are stored as an append-only log: new ones are added at the tail, old ones are not deleted immediately (retention is by time or size). Each message carries an **offset** - a monotonically increasing position number within the log.
**Retention vs deletion:** Kafka does not delete messages on consumption (unlike RabbitMQ). Messages are removed only when the retention policy triggers (time or size). This allows multiple independent consumers to read one topic and enables historical replay - critical for debugging and event sourcing.