Backend Transport

Apache Kafka: Event Streaming Platform

LinkedIn in 2010 had hundreds of services and dozens of data sources - and integration had become a graph of point-to-point ETL jobs that no one could keep in mind. Jay Kreps' team looked at the problem differently: what if data moved not as queues between services, but as a single append-only log that served as the source of truth? That is how Kafka was born. By 2025 it has become the foundation of streaming processing at Netflix, Uber, Goldman Sachs, Pinterest, and Airbnb - wherever events matter as a continuous flow rather than scattered messages. Understanding partitions, offsets, and exactly-once is the key to designing modern event-driven systems.

  • **Netflix:** Kafka processes 7 trillion events per day - views, recommendations, billing - all flowing through topics
  • **Uber:** the Kafka-based event bus coordinates ride dispatch and real-time map updates; latencies are measured in milliseconds
  • **Goldman Sachs / JPMorgan:** transactional pipelines built on Kafka EOS provide accurate accounting and audit of financial events

Topics and Partitions: The Foundation of Kafka

In 2010, LinkedIn hit a wall: dozens of services were writing logs and events into dozens of sinks, and the integration graph had turned into spaghetti. The team led by Jay Kreps reimagined the queue as a distributed append-only log - and Kafka was born. A Kafka topic is not a message queue but a log. Each topic is split into N partitions - physical log files on broker disks. Messages within a partition are ordered and immutable. Across partitions, order is not guaranteed. This model is what separates Kafka from RabbitMQ: there a queue is point-to-point, here the log is a shared substrate.

A partition is the unit of parallelism and replication. A topic with N partitions can be processed by N consumers in parallel. Each partition is replicated across M brokers (replication factor). The partition leader accepts writes; followers catch up. A message is assigned to a partition by hash(key) % N or round-robin when no key is provided. The key is the partitioning mechanism that preserves order: all events for one user land in the same partition and are processed sequentially.

A topic has 12 partitions. Messages are sent without a key in round-robin fashion. What ordering does Kafka guarantee?

Producers and Consumers: Writing and Reading the Log

A producer is a client that sends messages to a topic. It chooses the partition via a partitioner (by key or custom logic) and routes the record to that partition's leader. The acks parameter sets durability: 0 - fire-and-forget, 1 - leader acknowledgement, all - acknowledgement from every ISR replica. A consumer is a client that reads messages from a topic. It is pull-based: the consumer decides when and how much to fetch. This distinguishes Kafka from push-based brokers - backpressure is built into the architecture itself.

An idempotent producer (enable.idempotence=true) assigns each record a producer ID and a sequence number. The broker deduplicates retries within one session - removing duplicates on retry. This is the baseline guarantee: exactly one write per partition under network timeouts. Batching: the producer accumulates messages in a buffer for linger.ms milliseconds and ships them in one batch - amortizing network costs. On production loads a single producer reaches 100K+ msg/sec.

Why does a producer use enable.idempotence=true together with acks=all?

Consumer Groups: Horizontal Read Scaling

A Consumer Group is a mechanism that unites N instances of the same service to process a topic in parallel. The broker automatically distributes partitions across group members: each partition is assigned to exactly one consumer inside the group. This gives a precise scaling unit: if a topic has 12 partitions, a group of 12 consumers processes them in parallel. Add a 13th and it sits idle - there are not enough partitions. If one drops out, the broker triggers a rebalance and reassigns its partitions to the remaining members.

A group is identified by its group.id. The broker stores per-group committed offsets for each partition in the internal topic __consumer_offsets. On restart a consumer resumes from the stored offset. Different groups read the same topic independently: this is the classic pub/sub pattern - one 'transactions' topic is consumed by the billing group, the analytics group, and the fraud-detector group. Each maintains its own offsets and sees every message.

A topic has 8 partitions. The billing service runs as a Consumer Group of 12 instances. What happens?

Offsets and Retention: a Replayable Past

An offset is a monotonically growing position number within a partition. Kafka does not delete a message after delivery - it stays in the log for the duration of retention. Retention is governed by two parameters: log.retention.hours (time-based) and log.retention.bytes (size-based). When either triggers, whole old segments are dropped at once. This means any consumer can re-read the past - replay events for backfills, schema migrations, or recovery of a downstream system. This is the key difference from a queue: the log is not transient transport but long-term history.

Log compaction is an alternative retention mode where Kafka guarantees the last record per key. Older versions of the same key are removed by a background process. This turns a topic into a materialized view: the 'compacted users topic' holds the current state of every user. Streaming systems (Kafka Streams, Flink, ksqlDB) build tables on top of logs this way - the foundation of Change Data Capture (CDC) pipelines.

A bug lived in the analytics processor in production for a week. Which Kafka property makes it possible to fix and recompute the data without involving producers?

Exactly-Once Semantics: Transactions and Idempotence

By default Kafka provides at-least-once: a message may be delivered more than once under network errors. Exactly-Once Semantics (EOS), introduced in Kafka 0.11, combine the idempotent producer with the transactional API to guarantee that the processing result is visible exactly once. A Kafka transaction atomically combines: reading from a topic, business processing, writing the result to another topic, and committing the offset. Either everything applies or nothing does. This is the foundation of streaming processors: Kafka Streams and ksqlDB rely on EOS under the hood.

A transaction begins with initTransactions(); inside, send() and sendOffsetsToTransaction() are issued; it completes with commitTransaction() or abortTransaction(). The broker stores transaction state in the dedicated __transaction_state topic. Consumers with isolation.level=read_committed see only committed transactions - uncommitted messages remain invisible. The cost of EOS is commit latency and additional round trips to the broker; throughput drops by 10-30%.

Kafka is just a 'fast RabbitMQ' for microservices

Kafka is a distributed commit log: a topic is retained long-term, consumed independently by any number of groups, and supports replay. RabbitMQ is a queue broker: a message is delivered and removed

Everything important follows from the log model: replay of old events for backfills, materialized views via log compaction, a source of truth for CDC and event sourcing, and exactly-once semantics through transactions. Treating Kafka as 'just a queue' leaves 80% of its power unused

What distinguishes Kafka Exactly-Once Semantics from ordinary at-least-once with an idempotent consumer?

Key Takeaways

  • **A topic is a distributed log of partitions.** A partition is the unit of parallelism, replication, and ordering; Kafka does not provide a global order across partitions.
  • **The producer picks a partition by key,** idempotent + acks=all guarantee exactly one write per partition under failures and timeouts.
  • **A Consumer Group scales reads:** a partition belongs to exactly one consumer in the group; different groups consume the same topic independently.
  • **Offsets plus retention make the past replayable:** replay via --reset-offsets, log compaction builds a materialized view per key.
  • **Exactly-Once Semantics via transactions:** read-process-write becomes atomic; the foundation of Kafka Streams and streaming ETL.

Related Topics

Kafka grew out of the ideas of message queues and distributed systems but rethought them around the log.

  • RabbitMQ and AMQP — Comparison of models: RabbitMQ is a queue with message destruction; Kafka is a log with retention and replay
  • Distributed Systems: Replication — Partitions replicate via leader-follower with ISR, the same principle as Raft/MultiPaxos for consistency

Вопросы для размышления

  • LinkedIn's graph of point-to-point integrations became unmanageable. At what scale does it make sense to move from direct queues to a unified event bus like Kafka?
  • Exactly-Once in Kafka costs 10-30% of throughput. Which workloads justify that cost, and where is at-least-once with an idempotent consumer enough?
  • Log compaction turns a topic into a materialized view by key. Where is that more convenient than keeping the same table in Postgres and emitting change events?

Связанные уроки

  • bt-11-messaging-intro — Messaging patterns are the foundation before Kafka
  • bt-12-rabbitmq — RabbitMQ - alternative with push model
  • bt-14-kafka-deep — ISR internals and compaction - next level
  • bt-17-event-driven — Kafka is the backbone of event-driven architecture
  • ds-03-consensus — ISR replication uses Zookeeper/KRaft consensus
  • bd-01 — Kafka is the standard ingestion layer for big data
  • dist-07-transactions
Apache Kafka: Event Streaming Platform

0

1

Sign In