Big Data

Kafka: Architecture and Patterns

LinkedIn published Kafka in 2011 with a clear mission: move the activity of 175 million users through dozens of services without losses or duplicates. Today Kafka processes more than 7 trillion messages per day at LinkedIn alone. At Uber, Netflix, and Airbnb, Kafka became the nervous system of the entire real-time infrastructure. Behind that scale sit four straightforward concepts: topics, partitions, consumer groups, and delivery semantics.

**LinkedIn** runs Kafka for the activity stream: 7+ trillion messages per day, a single event bus connecting 1,000+ microservices
**Uber** builds its real-time demand map via Kafka: GPS events partitioned by geohash and processed with sub-minute latency
**Confluent Cloud** manages millions of topics for enterprise customers: Schema Registry plus Kafka Connect turns Kafka into a central data integration hub

Topics and the Commit Log

In 2010, LinkedIn faced a specific problem: 300 services needed to receive user-activity events. Direct HTTP calls between services created n-squared connections, and the system collapsed under any spike. The solution was a **distributed event log**. Producers write to one end; consumers read independently at their own pace. That design became Kafka.

A **topic** is a named stream of messages - analogous to a table in a database. Messages inside a topic are stored as an append-only log: new ones are added at the tail, old ones are not deleted immediately (retention is by time or size). Each message carries an **offset** - a monotonically increasing position number within the log.

**Retention vs deletion:** Kafka does not delete messages on consumption (unlike RabbitMQ). Messages are removed only when the retention policy triggers (time or size). This allows multiple independent consumers to read one topic and enables historical replay - critical for debugging and event sourcing.

Big Data

Kafka: Architecture and Patterns

**LinkedIn** runs Kafka for the activity stream: 7+ trillion messages per day, a single event bus connecting 1,000+ microservices
**Uber** builds its real-time demand map via Kafka: GPS events partitioned by geohash and processed with sub-minute latency
**Confluent Cloud** manages millions of topics for enterprise customers: Schema Registry plus Kafka Connect turns Kafka into a central data integration hub

Kafka: Architecture and Patterns

Topics and the Commit Log

Kafka: Architecture and Patterns

Topics and the Commit Log

Partitions: Horizontal Scaling

Consumer Groups: Parallel Consumption

Exactly-Once Semantics

Kafka: Architecture and Patterns

Related topics

Вопросы для размышления

Связанные уроки