Real-Time Backend
MQTT: Protocol for IoT and M2M
8 billion IoT devices by 2030. A smart outlet, temperature sensor, electricity meter - all of them need to send data to the cloud. HTTP is too heavy: hundreds of bytes of headers, no pub/sub, no retain. MQTT does all of this with a 2-byte header and runs on a microcontroller with 8KB RAM.
- **AWS IoT Core**: managed MQTT broker. 100M+ devices, billions of messages per day. Integration with Lambda, DynamoDB, Kinesis for IoT pipelines
- **Tesla**: MQTT for vehicle telemetry. Every Tesla is an MQTT publisher. Battery, temperature, error data -> MQTT broker -> ML processing -> OTA updates
- **Facebook Messenger**: historically used MQTT for mobile push notifications. Traffic savings are critical for mobile devices - MQTT is 5-10x more efficient than HTTP polling
MQTT: Pub/Sub for Machines with a 2-Byte Header
1999. Andy Stanford-Clark at IBM designs a protocol for monitoring an oil pipeline through SCADA systems in the desert. Requirements: minimal traffic (satellite communication costs $$$), reliability over unreliable links, running on microcontrollers with 8KB RAM. The result: MQTT (MQ Telemetry Transport). Fixed header: 2 bytes.
Architecture: publish-subscribe through a broker. A client knows nothing about other clients - only about topics. A device (publisher) sends temperature data to topic `sensors/room42/temperature`. A dashboard (subscriber) subscribes to `sensors/+/temperature` (wildcard `+` = one level). The broker (Mosquitto, EMQX, HiveMQ) routes messages.
ML parallel: MQTT topics are like queues in a streaming pipeline. `sensors/+/temperature` is a filter transform on the input. Wildcard `#` (multi-level) = subscribe to all events - analogous to a catch-all consumer. EMQX processes 100M+ MQTT messages per second - this is not a 'toy IoT protocol', it is production-grade messaging for real-time ML inference pipelines.
How does MQTT's pub/sub architecture fundamentally differ from request-response WebSocket?
QoS Levels: at-most-once, at-least-once, exactly-once
MQTT defines three delivery guarantee levels. QoS 0 (at-most-once): fire-and-forget, no acknowledgment. QoS 1 (at-least-once): PUBLISH -> PUBACK; if PUBACK is not received, the message is retried. Duplicates are possible. QoS 2 (exactly-once): 4-step handshake (PUBLISH -> PUBREC -> PUBREL -> PUBCOMP). Guaranteed no duplicates, but more expensive.
Practice: QoS 0 - telemetry (losing one temperature reading is not critical). QoS 1 - notifications, control commands (delivery required, duplicate handled idempotently). QoS 2 - financial transactions, critical commands (power cutoff). Rule: QoS between publisher and broker + QoS between broker and subscriber are separate! Effective QoS = min(publisher_qos, subscriber_qos).
A device publishes with QoS 2, subscriber subscribes with QoS 1. What is the effective QoS the subscriber receives?
Retained Messages: The Last Known State
The problem: a device publishes temperature every 5 minutes. A new subscriber (dashboard restarted) connects and waits for the next message - up to 5 minutes without data. Solution: retained message. The broker stores the last message with `retain: true` for each topic. A new subscriber receives it instantly upon subscribing.
Retained messages are not history - only the last value. A time-series store is needed for history. Typical pattern: MQTT retained = 'current device state', TimescaleDB/InfluxDB = history. Home Assistant, OpenHAB, and AWS IoT Core use this pattern: retained message as single source of truth about the current state of each device in the system.
What problem does a retained message solve in an IoT system?
Last Will and Testament: Detecting Disconnections
TCP can detect a broken connection - but with a delay (keepalive timeout). The IoT problem: a device lost power or Wi-Fi - how do the rest find out? MQTT Last Will and Testament (LWT): when connecting, a client specifies a 'will' - a topic and payload that the broker will publish upon unexpected disconnection.
LWT + retained = 'birth/death certificate' pattern for IoT. On connect: publish online=true with retain. LWT: offline=true with retain. At any point any new subscriber learns the current status of every device. Home Assistant uses this pattern for the availability topic: `homeassistant/sensor/temperature/availability` with values 'online'/'offline'.
MQTT is WebSocket with publish/subscribe layered on top of it
MQTT is a standalone protocol over TCP (ports 1883/8883). MQTT over WebSocket also exists for browsers, but that is the transport layer - the protocol itself is unchanged
MQTT was designed 10 years before WebSocket. Both run over TCP, but MQTT has minimal overhead (2-byte fixed header, binary), M2M-specific features (QoS, retained, LWT), and is built for unreliable links
Under what condition does the broker publish the Last Will message?
Related Topics
MQTT is a specialized protocol for IoT and M2M communication:
- Pub/Sub Pattern — MQTT is a concrete protocol-level implementation of the pub/sub architecture
- WebTransport — WebTransport is the next generation: QUIC-based, for browsers with UDP semantics
- Approach Comparison — MQTT vs WebSocket vs SSE: choosing a protocol for a specific task
Key Ideas
- **MQTT = pub/sub over TCP**: publisher -> broker -> subscriber. Full decoupling: publisher knows nothing about subscribers.
- **QoS levels**: 0 (fire-and-forget), 1 (at-least-once, duplicates possible), 2 (exactly-once, 4 RTT). Effective QoS = min(publisher, subscriber).
- **Retained messages**: broker stores the last message per topic. New subscriber receives current state immediately.
- **Last Will**: broker publishes the 'will' on abnormal disconnect (without DISCONNECT). Birth/death certificate pattern for devices.
- **Minimal overhead**: 2-byte fixed header, binary protocol. Runs on 8KB RAM microcontrollers.
Вопросы для размышления
- MQTT retained stores only the last value. How to build a system that needs both current state (retained) and the last 24 hours of history?
- QoS 2 guarantees exactly-once but requires 4 RTT. How does this affect latency with a 100ms round-trip over satellite?
- The '#' wildcard subscribes to all topics. What problems does this create in production with thousands of devices and how to address them?
Связанные уроки
- rt-14 — gRPC Streaming is another approach to binary protocols for machine-to-machine communication
- rt-16 — WebTransport is the next step: QUIC-based protocol for browsers
- rt-17 — MQTT implements the pub/sub pattern at the protocol level
- net-03-physical — MQTT is optimized for unreliable networks - understanding the network stack is helpful
- rt-06 — Comparison with WebSocket: different trade-offs for IoT vs browsers
- net-55-message-queues