Real-Time Backend
NATS / NATS JetStream
Kafka kills speed at the edge. NATS delivers in under 1 ms, and Cloudflare proved it.
- Cloudflare uses NATS to route events between points of presence (PoPs), where Kafka would introduce unacceptable batch delays on latency-critical paths
- Synadia (the creators of NATS) runs systems with more than 50 million connected IoT devices through a single NATS cluster with subject-based routing and no topic configuration
- Startups move from RabbitMQ to NATS JetStream: a single 20 MB binary replaces RabbitMQ plus the Erlang runtime plus plugins, and operational cost drops by an order of magnitude
- Fintech systems use NATS for request-reply between microservices. The built-in pattern enables synchronous calls on top of an async broker without extra libraries
NATS: minimalist broker
NATS is an open-source message broker built by Synadia. Its core principle: the binary weighs around 20 MB, starts in milliseconds, and consumes minimal RAM. Message delivery latency stays under 1 ms even at thousands of publications per second. This is not a stripped-down Kafka. It is a fundamentally different class of tool: Kafka is optimized for maximum throughput and persistence; NATS is optimized for minimum latency and operational simplicity.
Cloudflare uses NATS for edge routing of events between points of presence (PoPs). When a request hits a PoP in Warsaw and several internal services need to be notified, NATS delivers within sub-millisecond intervals where Kafka would introduce unacceptable delays due to its batching logic.
- Protocol over TCP: text-based, human-readable, easy to debug via telnet
- Pub/Sub with no persistence by default: fire-and-forget semantics
- A single server handles 10-15 million messages/sec on commodity hardware
- Clustering is built in: NATS forms a mesh cluster without an external coordinator (unlike Kafka + Zookeeper)
Cloudflare picked NATS over Kafka for edge routing. Which NATS property was the deciding factor?
Subject-based routing
NATS has no concept of a "topic" the way Kafka does. Instead, it uses subjects: hierarchical strings like `orders.europe.paid`. A subscriber can listen on an exact address, on a single level via `*`, or on an entire subtree via `>`. This produces routing without pre-declared topics. Publishing to the right subject is enough.
Queue groups in NATS are analogous to Kafka consumer groups, but without the need to configure partitions. When scaling a service horizontally, just start a new instance with the same queue group name and NATS automatically includes it in the load balancing.
- `*` matches one hierarchy level: `orders.*.paid` matches `orders.europe.paid` but not `orders.eu.west.paid`
- `>` matches one or more levels: `orders.>` matches everything starting with `orders.`
- Subjects without wildcards mean exact match, the most efficient option
- Routing happens in broker memory, with no disk I/O for routing
A service handles payments across several regions. It needs to listen to every `payment.confirmed` event regardless of region: `payment.us.confirmed`, `payment.eu.confirmed`. Which subject pattern works?
JetStream: persistence on top of NATS
Plain NATS is pure pub/sub with no delivery guarantees: if no subscriber is present at publish time, the message is lost. JetStream is a built-in persistence layer that adds at-least-once delivery, replay from any position, and consumer cursors. All of this comes without a separate process, as an extension of the same NATS server.
JetStream vs Kafka: Kafka stores every event in an append-only log and is itself a storage-first system. JetStream is messaging-first with optional persistence. Kafka wins on throughput in the millions of events per second and on long-term retention (days or weeks). JetStream wins on latency and operational simplicity: no Zookeeper, no separate KRaft quorum. JetStream is built into the same NATS process.
- Create a stream: define a name plus the subject patterns that flow into it
- Publish via `js.publish()` instead of `nc.publish()` to get a sequence number and a write acknowledgement
- Create a consumer with `AckPolicy.Explicit` for reliable processing
- Call `msg.ack()` after successful processing or `msg.nak()` to retry
- Configure `max_deliver` and backoff intervals to control retry logic
A notification service publishes events to JetStream. When the consumer service restarts, no message can be lost. Which JetStream mechanism guarantees this?
NATS vs Redis Pub/Sub vs Kafka
These three tools often get compared as interchangeable, but each has its own niche. Redis Pub/Sub is a primitive broadcast with no persistence and no guarantees (good for cache invalidation). NATS is fast routing with optional persistence via JetStream (good for microservices, IoT, edge). Kafka is a storage-first platform for event streaming, audit logs, and data pipelines with retention measured in weeks and throughput in millions per second.
Synadia (the company behind NATS) positions NATS as a "connectivity fabric": a single network for an entire infrastructure. One NATS cluster can serve IoT devices (edge), microservices (core), and data pipelines (JetStream) through isolated accounts. This reduces the number of brokers to operate.
- Pick Redis Pub/Sub for: cache invalidation, real-time presence, simple notifications without guarantees
- Pick NATS core for: microservice communication, request-reply pattern, IoT telemetry
- Pick JetStream for: work queues, event sourcing in small systems, RabbitMQ replacement
- Pick Kafka for: audit log, event streaming with long retention, data pipelines, throughput above 1M msg/sec
JetStream is a separate service that must run alongside NATS
JetStream is built into the same nats-server process and is enabled with the `-js` flag or via configuration
Unlike Kafka, where KRaft or Zookeeper run as separate processes, JetStream is part of nats-server. A single binary with a single config file runs both the messaging and the persistence layer.
A team is building a telemetry system for 50,000 IoT devices. They need sub-2ms delivery, horizontal scaling, and minimal operational overhead. Kafka is already in the infrastructure for analytics. What should they pick for telemetry ingestion?
Key takeaways
- NATS core is fire-and-forget pub/sub with latency under 1 ms, subject-based routing via `*` and `>` wildcards, and queue groups for load balancing
- JetStream adds at-least-once delivery and durable consumers on top of NATS without a separate process. It is enabled by a flag in the same nats-server
- NATS beats Kafka on latency and operational simplicity. Kafka wins on throughput and long-term storage. They are often used together
Related topics
NATS fits into the event-driven ecosystem alongside other brokers and patterns:
- Apache Kafka — An alternative for high throughput and long-term event retention
- Redis Pub/Sub — A simpler counterpart without persistence, for cache invalidation and presence
- Message Queue patterns — JetStream implements work queue and pub/sub patterns from messaging theory
Вопросы для размышления
- If you had to choose between NATS JetStream and Kafka for a new payment events system, what questions would you ask the team to make the right call?
- How does subject-based routing in NATS (`orders.eu.paid`) change the way you design APIs between microservices compared to Kafka topics?
- JetStream is built into the NATS server. Is that simplification or hidden risk? In which scenario could a monolithic broker design become a problem?