Backend Transport
RabbitMQ: AMQP, Exchanges, Queues
In 2013, Instagram was processing 1,000 photos per second at peak. Their stack: Celery + RabbitMQ - every uploaded photo went into a queue processed separately: resizing, filters, follower notifications. RabbitMQ was chosen because AMQP's routing flexibility was something Kafka could not offer at the time.
- **Zalando** uses RabbitMQ to route orders between warehouses across 17 European countries: a topic exchange with routing keys by country and product type directs each order to the nearest fulfillment center
- **GitLab CI/CD** was historically built on Sidekiq + RabbitMQ: every pipeline job entered a queue and runners competed for tasks through competing consumers
- **Trivago** routes search queries to 400+ hotel suppliers through a fanout exchange: one user query is multiplied into N parallel requests to different hotel inventory systems
The AMQP Model
RabbitMQ is built on AMQP 0-9-1 - and its model is fundamentally different from what most developers intuitively expect. A producer never writes directly to a queue. A message lands in an exchange (router), which applies routing rules to decide which queues to place it in. Queues store messages. Consumers read from queues. This separation - exchange as router, queue as buffer - gives RabbitMQ extraordinary routing flexibility.
Key AMQP entities: **Exchange** - the message router; receives from producers, distributes to queues according to rules. **Queue** - message storage; consumers subscribe to a queue. **Binding** - an exchange-to-queue link with an optional routing key or arguments. **Routing key** - a string label the producer attaches to a message; the exchange uses it to select queues. **Virtual Host (vhost)** - isolation within a single server, like a schema in PostgreSQL.
A producer publishes to a direct exchange with routing key 'payment.success'. Two queues are bound: one with binding key 'payment.success', one with 'payment.*'. Where does the message go?
Exchange Types
The four exchange types in RabbitMQ represent four different routing strategies. Direct - exact key match (one recipient or competing consumers). Fanout - broadcast to all bound queues with no key matching. Topic - wildcard routing by key using '.' segments and '*'/'#' wildcards. Headers - matching on message headers instead of routing key (rare, expensive). The choice of exchange type defines the system topology.
Wildcard rules for **topic exchange**: an asterisk `*` replaces exactly one word (segment), `#` replaces zero or more words. For example, routing key `order.europe.paid` matches binding `order.*.paid` (one word in place of `europe`) and `order.#` (any suffix). It does not match `order.paid` (no middle segment). Topic exchange is most popular in production because it expresses meaningful event hierarchies naturally.
A 'user.registered' event must reach three services: email, CRM, and analytics. All three must always receive a copy. Which exchange type fits?
Ack, Nack, and Prefetch
RabbitMQ does not delete a message from the queue when it is delivered to a consumer - it stays in 'unacknowledged' state until an explicit confirmation arrives. This is critical: if a consumer crashes while processing, the broker redelivers the message to another worker. Three processing outcomes: **ack** - success, delete it; **nack** - failure, redeliver (requeue=true) or send to DLQ (requeue=false); **reject** - synonym for nack for a single message.
**Prefetch (basic.qos)** is the key backpressure parameter: limits unacknowledged messages per channel. `prefetch=1` is safest (fair dispatch between workers) but lowest throughput. `prefetch=50-100` is a good balance for most cases. `prefetch=0` means no limit (maximum throughput, risk of OOM if processing is slow). Rule of thumb: `prefetch * avg_processing_time < desired_latency`.
A consumer receives a message and encounters a transient network error during processing. What is the correct action?
Dead Letter Queues
A Dead Letter Queue (DLQ) is a quarantine for messages that failed processing after all attempts. Without a DLQ, a failed message either loops forever (requeue=true with no retry cap) or is silently dropped (requeue=false). A DLQ enables failure analysis, manual replay after a bug fix, and prevents blocking the main queue. Netflix maintains a DLQ for every event queue - engineers review DLQ depth daily as a system health signal.
A message routes to a Dead Letter Exchange (DLX) under three conditions: **nack/reject** with requeue=false, **TTL expired** (x-message-ttl), **queue overflow** (x-max-length). Configure on the queue with `x-dead-letter-exchange` and optionally `x-dead-letter-routing-key`. Retry-with-delay pattern: main queue -> nack -> DLX -> retry queue with TTL -> back to main exchange. This implements exponential backoff without extra application code.
In which situation will a message NOT be routed to the Dead Letter Exchange?
Clustering and High Availability
A single RabbitMQ node is a single point of failure. Production requires a cluster: multiple nodes share metadata (exchanges, bindings, users) but by default queues only exist on their owner node. Losing that node means losing all unread messages. Quorum Queues (RabbitMQ 3.8+) solve this through Raft replication: a majority of nodes must confirm a message write before acknowledging it, and the queue survives losing a minority of nodes.
Three queue modes: **Classic Queue** - fast, single owner, no HA without extra config. **Mirrored Queue (deprecated)** - replicated to all nodes, heavy network load, obsolete. **Quorum Queue** - Raft consensus, N/2+1 nodes required for writes, recommended for production. Minimum 3 nodes for Raft to maintain quorum when one fails. Formula: `replicas = 2 * tolerance + 1`, where tolerance is simultaneous failures to survive.
A RabbitMQ cluster automatically ensures high availability of all queues
Without explicitly specifying x-queue-type: quorum, queues are NOT replicated - when the owner node fails, the queue and all its messages are lost
By default, Classic Queues live on a single node. The cluster shares metadata but not queue data. Only explicitly creating a Quorum Queue enables Raft replication.
A 5-node RabbitMQ cluster uses Quorum Queues. How many nodes can fail simultaneously without data loss?
Key Ideas
- **AMQP model**: producer -> exchange -> binding -> queue -> consumer; the exchange as router decides where messages go - there is no direct write to a queue
- **Four exchange types** cover all routing scenarios: direct (exact match), fanout (to everyone), topic (wildcard routing), headers (header-based matching); topic is most widely used in production
- **Ack + DLQ + Quorum** - the reliability triad: manual acknowledgment protects against loss on consumer crash; DLQ preserves failed messages for analysis; Quorum Queues protect against node loss
Related Topics
RabbitMQ as a concrete implementation of message broker concepts:
- Message Brokers: Why and When — RabbitMQ implements all three decoupling types and supports at-least-once and exactly-once semantics
- gRPC and Protocol Buffers — RabbitMQ messages are often serialized as Protobuf instead of JSON for better efficiency
Вопросы для размышления
- A fanout exchange is convenient for broadcasting, but all subscribers receive identical copies. How would you design routing where email gets everything but analytics gets only premium-user events - without adding a separate exchange per subscriber?
- Quorum Queues require majority acknowledgment before confirming a write, increasing latency. For which business logic cases is this latency unacceptable, and what would you use instead?
- A DLQ accumulates failed messages - how would you build a pipeline to analyze and replay them after a bug fix, without losing the original context: metadata, arrival time, retry count?