Backend Transport

Batching, Compression, and Zero-Copy

Kafka processes 1 trillion messages per day at LinkedIn. Not through supercomputers - through standard hardware using batching, zero-copy, and OS-level optimizations. A single broker handles 1M messages/sec because of linger.ms, sendfile(), and page cache. The application code barely matters when the I/O stack is optimized.

**Kafka** uses sendfile() (Java NIO transferTo) for consumer replay. Log data goes from disk to NIC without any JVM heap allocation - the key to 1M+ messages/sec on a single broker.
**Cloudflare** uses XDP/eBPF for DDoS mitigation: processes 26 million packets per second on a single CPU core, blocking attacks before they reach the kernel network stack.
**AWS** ML training clusters use EFA (RDMA) for gradient synchronization between GPUs: 1-2 microsecond latency vs 50 microsecond TCP - critical when training LLMs across thousands of GPUs.

Batching and Linger

Batching groups multiple messages into a single network packet. Instead of one syscall per message, the producer accumulates messages and sends them together. linger.ms in Kafka defines the maximum wait time before a batch is sent even if batch.size is not reached.

Kafka uses linger.ms=20 and batch.size=1MB for high-throughput producers. Trade-off is explicit: throughput increases, P99 latency increases by linger.ms. For interactive UIs - linger.ms=0, for analytics pipelines - linger.ms=100+.

Kafka producer with linger.ms=20 vs linger.ms=0. What changes?

Compression: Algorithms and Trade-offs

Compression reduces bytes transferred over the network at the cost of CPU. The choice of algorithm depends on the data type (text vs binary), latency requirements, and whether the service is CPU-bound or network-bound.

Backend Transport

Batching, Compression, and Zero-Copy

**Kafka** uses sendfile() (Java NIO transferTo) for consumer replay. Log data goes from disk to NIC without any JVM heap allocation - the key to 1M+ messages/sec on a single broker.
**Cloudflare** uses XDP/eBPF for DDoS mitigation: processes 26 million packets per second on a single CPU core, blocking attacks before they reach the kernel network stack.
**AWS** ML training clusters use EFA (RDMA) for gradient synchronization between GPUs: 1-2 microsecond latency vs 50 microsecond TCP - critical when training LLMs across thousands of GPUs.

Batching and Linger

Kafka producer with linger.ms=20 vs linger.ms=0. What changes?

Batching, Compression, and Zero-Copy

Batching and Linger

Compression: Algorithms and Trade-offs

Batching, Compression, and Zero-Copy

Batching and Linger

Compression: Algorithms and Trade-offs

Zero-Copy and sendfile()

io_uring: Asynchronous I/O

Kernel Bypass: DPDK and RDMA

Summary

Related Topics

Вопросы для размышления

Связанные уроки