Backend Transport
Message Brokers: Why and When
Amazon Prime Day 2023: at peak load, 84,000 orders per minute flowed through their systems. No direct HTTP request from frontend to warehouse would survive that load - too many speed mismatches, too many failure points. Message brokers are what allow systems running at vastly different speeds to function as one coherent whole.
- **Uber** processes 1+ million geolocation events per second through Kafka - driver GPS coordinates are published to topics, and dozens of downstream services subscribe to position updates
- **WhatsApp** delivers 100 billion messages daily through a system built on queues with at-least-once semantics and idempotency on the receiving end
- **Shopify** on Black Friday switches the payment pipeline to async via broker: the shopper sees 'processing' instantly; the actual charge flows through a queue with retry logic for bank failures
Why Brokers
In 2017, Knight Capital lost USD 440 million in 45 minutes due to direct synchronous calls between trading systems - one failure cascaded through everything. A message broker exists precisely to break that cascade: a producer places a message in the broker and does not wait for a consumer to process it. If the consumer goes down, the message does not vanish. If the producer generates ten times more than the consumer can handle, the broker absorbs the spikes.
A broker solves three fundamental problems: **Temporal decoupling** - producer and consumer do not need to run simultaneously. **Rate decoupling** - different generation and processing speeds are fine. **Space decoupling** - the producer does not need to know who or where processes the messages. Without a broker, all three problems are solved manually in every service - and poorly.
Which of the following problems does a message broker NOT solve?
Queues vs Topics
A queue and a topic both store messages, but they solve fundamentally different problems. A queue is about work: exactly one of N workers must process and delete each message. A topic is about events: N subscribers must each receive their own copy of every message. Concrete example: an order goes into a queue (one warehouse processes it, not all at once), while a 'user registered' event goes to a topic (email service, CRM, and analytics each get their own copy in parallel).
**Queue (Point-to-Point)**: one producer, one consumer per message; message is deleted after successful processing; multiple workers compete for messages (competing consumers). **Topic (Pub-Sub)**: one or more producers; each subscriber receives a full copy; message is retained until all subscribers have read it or TTL expires. RabbitMQ implements both through different exchange types. Kafka is always topic-based with configurable retention.
A payment processing service must have exactly one instance handle each payment. Which model fits?
Delivery Guarantees
Distributed systems 101: messages can be lost, duplicated, or arrive out of order - this is a fact of distributed life, not a bug in any specific broker. Three guarantee levels differ not in reliability but in cost: at-most-once is fast and can lose messages, at-least-once is reliable but can duplicate, exactly-once is slow and expensive. Stripe publicly acknowledged in 2023 that their payment webhook system runs on at-least-once with idempotency keys on the receiver side - precisely because exactly-once at the broker level is prohibitively expensive.
**At-most-once**: producer fires and forgets; consumer does not acknowledge; messages may be lost but never duplicated. Acceptable for metrics and logs. **At-least-once**: producer retries on timeout; consumer acknowledges after processing; messages definitely arrive, but duplicates are possible. Requires idempotent processing. **Exactly-once**: 2-phase commit or transactional outbox; maximum guarantees, maximum complexity and latency. Kafka Transactions provide exactly-once within a Kafka cluster.
A card charge service accidentally processed the same message twice. Which delivery guarantee was in use, and what was missing?
Backpressure
Backpressure is a mechanism by which a slow consumer signals a fast producer to slow down. Without backpressure, the speed difference accumulates in broker memory: in 2019, a major European bank lost 4 hours of productivity when a transaction queue overflowed and the broker began OOM-killing processes. Proper backpressure is not a failure mode - it is the system's normal self-regulation mechanism under load.
Backpressure mechanisms: **Consumer-driven** - consumer explicitly sets a prefetch count (RabbitMQ basic.qos): 'don't send me more than N messages until I acknowledge them'. **Queue depth monitoring** - when queue depth crosses a threshold, scale up consumers (autoscaling) or slow down the producer. **Producer-side rate limiting** - producer checks queue depth before publishing. In Kafka, backpressure works differently: the producer monitors Consumer Group lag metrics directly.
Backpressure is a problem to eliminate - a well-designed system should never experience backpressure
Backpressure is the system's normal self-regulation signal; good design does not eliminate backpressure but responds to it: scales consumers, slows the producer, or rejects with an explicit error
Attempting to 'eliminate' backpressure through unlimited buffering leads to OOM and uncontrolled degradation. Explicit backpressure with bounded buffers is controlled degradation, not a failure.
A RabbitMQ consumer sets prefetch=1. What does this mean for throughput?
Key Ideas
- **A broker breaks the synchronous cascade**: producer publishes without waiting for the consumer - temporal, rate, and space decoupling all at once; one component failing does not cascade into others
- **Queue vs Topic**: queue for work (one worker picks the task), topic for events (every subscriber gets a copy); the choice is driven by the business requirement, not technology preference
- **At-least-once + idempotency** is the industry gold standard: exactly-once is too expensive, at-most-once too unreliable; most production systems use at-least-once with idempotency keys
Related Topics
Message brokers are the foundation of event-driven architecture:
- RabbitMQ: AMQP, Exchanges, Queues — Concrete broker implementation with queue and topic support through exchange routing
- WebSocket and Server-Sent Events — A broker often sits behind a WebSocket gateway: broker -> gateway -> client for real-time delivery
Вопросы для размышления
- If at-least-once with idempotency is cheaper than exactly-once, which cases genuinely require exactly-once - and how do you determine whether a current system's guarantees are sufficient?
- Backpressure slows the producer when consumers are overloaded. But what if the producer cannot slow down (stock market feed, GPS tracking)? What are the alternatives?
- A company switches from synchronous REST to async broker. How does UX change: what does a user see while a task is in the queue, and how do you communicate errors that occur asynchronously?