Real-Time Backend
What Is Real-Time
In 2013, Facebook faced a problem: the mobile app queried the server every 5 seconds asking 'any new messages?'. With 500 million users, that was 6 billion pointless requests per minute. Batteries drained, servers burned. The solution - MQTT push protocol - cut traffic by 40× and made Messenger truly instant.
- Telegram, WhatsApp - instant message delivery via WebSocket and push
- Google Docs, Figma - simultaneous document editing by dozens of people
- Uber, Yandex Taxi - live driver position updated every 2 seconds
- Fortnite, CS2 - 128 tick rate, world state updated 128 times per second
- Bloomberg Terminal - stock quotes updated in real-time worldwide
From Comet to WebSocket
Before 2011, real-time on the web was a hack. The Comet technique (2006) used a hidden iframe with an infinite server response - the browser rendered incoming <script> tags as they arrived. Another hack - Flash Socket - required a plugin. In 2011, RFC 6455 standardized WebSocket, giving the web a proper bidirectional channel. This changed backend architecture forever.
WebSocket transformed the browser from a 'requesting client' into a full participant in real-time communication
What 'real-time' means for the backend
A message is being typed in Telegram. The contact sees 'typing...' - instantly. The message is sent and read half a second later. That is **real-time**: the server delivers data to the client the moment it appears, rather than waiting for the client to ask.
In classical HTTP the client is always the initiator: send a request - get a response. The server **cannot** reach out to the client on its own. It is like a mailbox: a letter appears only when checked. Real-time flips this model - the server **pushes** data to the client.
- Request-Response (classical HTTP) — Client asks → server answers. Client doesn't ask - no data. Latency = interval between requests.
- Real-Time (push model) — Server sends data the moment it becomes available. Client is subscribed and waits. Latency = network + processing.
Users **don't think** about protocols. They think in terms of expectations: I type text - my contact sees it immediately. I liked a post - the counter updates. I moved on the map - the courier sees my position. When latency exceeds expectations, the product feels 'broken'.
| User action | Expectation | If slower |
|---|---|---|
| Typing a message | < 100 ms indicator | Indicator flickers, annoying |
| Message sent | < 300 ms delivery | Feels like it froze |
| Liked a post | < 500 ms update | Taps again |
| Moving on map | < 1 s position | Courier 'jumps' |
| Received notification | < 3 s after event | Missed something important |
**Real-time does not mean instant.** It means 'fast enough for the specific use case'. Chat tolerates 200 ms; stock trading does not.
Real-time means zero latency
Real-time means latency below the perception threshold for a specific use case
The physics of networking does not allow zero latency. 'Real-time' is when the user does not notice the delay. For chat that is 200 ms, for games 50 ms, for trading microseconds.
What is the key difference between real-time and classical HTTP?
Polling vs Push: two data delivery models
Consider waiting for a parcel. **Polling** - every 5 minutes someone walks to the door to check. **Push** - the courier rings the bell directly. The efficiency difference is obvious.
**Long Polling** is a compromise. The client sends a request, but the server **does not reply immediately** - it holds the connection open until data appears (or a timeout expires).
| Approach | Latency | Server load | When to use |
|---|---|---|---|
| Short Polling | 0..interval (avg = interval/2) | High (empty requests) | Infrequent updates, simple API |
| Long Polling | ~network latency | Medium (open connections) | When WebSocket is unavailable |
| WebSocket | ~network latency | Low (one connection) | Chat, games, live data |
| SSE (Server-Sent Events) | ~network latency | Low (unidirectional) | Notifications, feeds, dashboards |
**Rule of thumb:** if data updates more than once every 10 seconds - polling will not cut it. Use push.
Long polling solves all the problems of polling
Long polling is a compromise that introduces its own problems: every client holds an open HTTP connection
With 50,000 clients, long polling creates 50,000 hanging HTTP connections. This consumes memory and file descriptors on the server. WebSocket uses a lighter protocol after the handshake.
A chat app with 10,000 users uses short polling every 2 seconds. How many HTTP requests per minute does the server receive?
Latency Budget: allocating allowed delay
When a user sends a chat message, it travels a path: client → network → server (processing) → network → recipient. Each hop adds latency. A **latency budget** breaks the allowed end-to-end delay down by component.
| Use Case | Allowed latency | Server budget | Network budget |
|---|---|---|---|
| Typing indicator | < 100 ms | 10 ms (relay only) | 40 ms (2 hops) |
| Chat message | < 200 ms | 40 ms (validate + store) | 40 ms |
| Notification | < 2 s | 500 ms (generate + route) | 200 ms |
| Live dashboard | < 1 s | 200 ms (aggregate) | 100 ms |
| Multiplayer game | < 50 ms | 10 ms (game loop tick) | 20 ms |
| HFT trading | < 1 ms | 0.1 ms | 0.5 ms (colocation) |
A latency budget is a design tool. It helps identify **where to optimize**. If the network consumes 80% of the budget, optimizing server code is pointless - the architecture itself must be reconsidered (CDN, edge computing, colocation).
- **Define the allowed latency** from product requirements and UX research
- **Break it down by component:** client → network → server → network → client
- **Measure actual values** for each component (do not guess!)
- **Find the bottleneck** - the component that consumes the most of the budget
- **Optimize the bottleneck** or reconsider the architecture
**P99, not average!** An average latency of 50 ms sounds great. But if 1% of requests take 5 seconds, every hundredth user suffers. With 1 million users that is 10,000 people. Always design the budget around P99 (99th percentile).
Why Discord switched from Go to Rust
The Go garbage collector caused 1-10 ms pauses every few seconds. For chat - acceptable. For voice - audible. Discord rewrote critical services in Rust (no GC) and achieved a stable P99 < 1 ms. The latency budget drove the language choice.
The allowed latency for chat is 200 ms. Network (RTT) takes 80 ms, client rendering takes 20 ms. How much is left for the server?
Map of real-time use cases
Real-time is not a single technology. It is a spectrum of tasks with different requirements for latency, reliability, and scale. Understanding the map of use cases is what guides technology selection for each task.
| Use Case | Latency | Direction | Technology | Examples |
|---|---|---|---|---|
| Messaging / Chat | < 200 ms | Bidirectional | WebSocket | Telegram, Slack, WhatsApp |
| Typing indicators | < 100 ms | Bidirectional | WebSocket | 'Typing...' in messengers |
| Notifications | < 3 s | Server → Client | SSE / Push API | Likes, comments, alerts |
| Live dashboard | < 1 s | Server → Client | SSE / WebSocket | Grafana, trading terminals |
| Collaborative editing | < 100 ms | Bidirectional | WebSocket + CRDT/OT | Google Docs, Figma |
| Multiplayer games | < 50 ms | Bidirectional | WebSocket / UDP | Fortnite, CS2 |
| Live location | < 2 s | Bidirectional | WebSocket | Uber, Yandex Taxi |
| Stock trading | < 1 ms | Bidirectional | Custom TCP / FPGA | NASDAQ, exchanges |
| Live streaming | < 5 s | Server → Client | WebRTC / HLS | Twitch, YouTube Live |
Notice the **Direction** column. Not all use cases require a bidirectional channel. Notifications and dashboards are **server → client** only. For these, SSE is simpler and sufficient. WebSocket is needed only when the client actively sends data too.
Each use case imposes different requirements not only on latency but also on **delivery guarantees**:
- At-most-once — Typing indicators, cursor positions. Losing one event is imperceptible - the next update will correct it.
- At-least-once — Notifications, feed events. Better to show a duplicate than to lose an important notification.
- Exactly-once — Payments, chat messages. Duplicates are a problem (double charge). Requires idempotency and acknowledgements.
**Start with the simplest solution.** SSE covers 80% of tasks (notifications, feeds, dashboards). WebSocket covers the remaining 20% (chat, games, collaborative editing). Custom UDP - isolated cases (HFT, FPS shooters).
For a live dashboard that updates charts once per second, the best fit is:
Key Lesson Ideas
- Real-time - the server pushes data to the client without waiting for a request
- Short polling - simple but wasteful: 99% of requests return nothing
- Long polling - a compromise, but every client holds a connection open
- WebSocket - full-duplex channel, the standard for bidirectional real-time communication
- SSE - a simple unidirectional stream, ideal for notifications and dashboards
- Latency budget - breaking the allowed delay into components (network, server, client)
- Always design around P99, not the average - the tail of the distribution kills UX
What's next
This lesson covered why real-time is needed and what problems it solves. The next step is to explore the specific protocols and how they work internally.
- WebSocket protocol — The main bidirectional real-time protocol
- Server-Sent Events — Unidirectional server push
- Pub/Sub pattern — Scaling real-time across multiple servers
Вопросы для размышления
- What real-time features do popular apps (messengers, live dashboards, collaborative editors) have? What technology is each likely using?
- When adding real-time notifications to an existing REST API, which approach fits best and why?
- How does the latency budget change when users are distributed across multiple continents?
Связанные уроки
- rt-02-http-limits — HTTP limitations discussed next explain why real-time architectures were invented
- bt-01-overview — Real-time protocols are a specialization of the transport overview covered in backend-transport
- st-01-feedback-loops — WebSocket creates a closed feedback loop; polling is an open loop with high latency
- alg-01-big-o — Push O(1) vs polling O(n) is a direct application of complexity analysis to protocol choice
- sd-01-intro — Real-time requirements appear in System Design estimation: QPS, connection counts, fan-out
- net-21-http-basics
- net-63-realtime-compare