Real-Time Backend

What Is Real-Time

In 2013, Facebook faced a problem: the mobile app queried the server every 5 seconds asking 'any new messages?'. With 500 million users, that was 6 billion pointless requests per minute. Batteries drained, servers burned. The solution - MQTT push protocol - cut traffic by 40× and made Messenger truly instant.

Telegram, WhatsApp - instant message delivery via WebSocket and push
Google Docs, Figma - simultaneous document editing by dozens of people
Uber, Yandex Taxi - live driver position updated every 2 seconds
Fortnite, CS2 - 128 tick rate, world state updated 128 times per second
Bloomberg Terminal - stock quotes updated in real-time worldwide

From Comet to WebSocket

Before 2011, real-time on the web was a hack. The Comet technique (2006) used a hidden iframe with an infinite server response - the browser rendered incoming <script> tags as they arrived. Another hack - Flash Socket - required a plugin. In 2011, RFC 6455 standardized WebSocket, giving the web a proper bidirectional channel. This changed backend architecture forever.

WebSocket transformed the browser from a 'requesting client' into a full participant in real-time communication

What 'real-time' means for the backend

A message is being typed in Telegram. The contact sees 'typing...' - instantly. The message is sent and read half a second later. That is **real-time**: the server delivers data to the client the moment it appears, rather than waiting for the client to ask.

In classical HTTP the client is always the initiator: send a request - get a response. The server **cannot** reach out to the client on its own. It is like a mailbox: a letter appears only when checked. Real-time flips this model - the server **pushes** data to the client.

Request-Response (classical HTTP) — Client asks → server answers. Client doesn't ask - no data. Latency = interval between requests.
Real-Time (push model) — Server sends data the moment it becomes available. Client is subscribed and waits. Latency = network + processing.

Users **don't think** about protocols. They think in terms of expectations: I type text - my contact sees it immediately. I liked a post - the counter updates. I moved on the map - the courier sees my position. When latency exceeds expectations, the product feels 'broken'.

User action	Expectation	If slower
Typing a message	< 100 ms indicator	Indicator flickers, annoying
Message sent	< 300 ms delivery	Feels like it froze
Liked a post	< 500 ms update	Taps again
Moving on map	< 1 s position	Courier 'jumps'
Received notification	< 3 s after event	Missed something important

**Real-time does not mean instant.** It means 'fast enough for the specific use case'. Chat tolerates 200 ms; stock trading does not.

Real-time means zero latency

Real-time means latency below the perception threshold for a specific use case

The physics of networking does not allow zero latency. 'Real-time' is when the user does not notice the delay. For chat that is 200 ms, for games 50 ms, for trading microseconds.

What is the key difference between real-time and classical HTTP?

Polling vs Push: two data delivery models

Consider waiting for a parcel. **Polling** - every 5 minutes someone walks to the door to check. **Push** - the courier rings the bell directly. The efficiency difference is obvious.

**Long Polling** is a compromise. The client sends a request, but the server **does not reply immediately** - it holds the connection open until data appears (or a timeout expires).

Approach	Latency	Server load	When to use
Short Polling	0..interval (avg = interval/2)	High (empty requests)	Infrequent updates, simple API
Long Polling	~network latency	Medium (open connections)	When WebSocket is unavailable
WebSocket	~network latency	Low (one connection)	Chat, games, live data
SSE (Server-Sent Events)	~network latency	Low (unidirectional)	Notifications, feeds, dashboards

**Rule of thumb:** if data updates more than once every 10 seconds - polling will not cut it. Use push.

Long polling solves all the problems of polling

Long polling is a compromise that introduces its own problems: every client holds an open HTTP connection

With 50,000 clients, long polling creates 50,000 hanging HTTP connections. This consumes memory and file descriptors on the server. WebSocket uses a lighter protocol after the handshake.

A chat app with 10,000 users uses short polling every 2 seconds. How many HTTP requests per minute does the server receive?

Latency Budget: allocating allowed delay

When a user sends a chat message, it travels a path: client → network → server (processing) → network → recipient. Each hop adds latency. A **latency budget** breaks the allowed end-to-end delay down by component.

Use Case	Allowed latency	Server budget	Network budget
Typing indicator	< 100 ms	10 ms (relay only)	40 ms (2 hops)
Chat message	< 200 ms	40 ms (validate + store)	40 ms
Notification	< 2 s	500 ms (generate + route)	200 ms
Live dashboard	< 1 s	200 ms (aggregate)	100 ms
Multiplayer game	< 50 ms	10 ms (game loop tick)	20 ms
HFT trading	< 1 ms	0.1 ms	0.5 ms (colocation)

A latency budget is a design tool. It helps identify **where to optimize**. If the network consumes 80% of the budget, optimizing server code is pointless - the architecture itself must be reconsidered (CDN, edge computing, colocation).

**Define the allowed latency** from product requirements and UX research
**Break it down by component:** client → network → server → network → client
**Measure actual values** for each component (do not guess!)
**Find the bottleneck** - the component that consumes the most of the budget
**Optimize the bottleneck** or reconsider the architecture

**P99, not average!** An average latency of 50 ms sounds great. But if 1% of requests take 5 seconds, every hundredth user suffers. With 1 million users that is 10,000 people. Always design the budget around P99 (99th percentile).

Why Discord switched from Go to Rust

The Go garbage collector caused 1-10 ms pauses every few seconds. For chat - acceptable. For voice - audible. Discord rewrote critical services in Rust (no GC) and achieved a stable P99 < 1 ms. The latency budget drove the language choice.

The allowed latency for chat is 200 ms. Network (RTT) takes 80 ms, client rendering takes 20 ms. How much is left for the server?

Map of real-time use cases

Real-time is not a single technology. It is a spectrum of tasks with different requirements for latency, reliability, and scale. Understanding the map of use cases is what guides technology selection for each task.

Use Case	Latency	Direction	Technology	Examples
Messaging / Chat	< 200 ms	Bidirectional	WebSocket	Telegram, Slack, WhatsApp
Typing indicators	< 100 ms	Bidirectional	WebSocket	'Typing...' in messengers
Notifications	< 3 s	Server → Client	SSE / Push API	Likes, comments, alerts
Live dashboard	< 1 s	Server → Client	SSE / WebSocket	Grafana, trading terminals
Collaborative editing	< 100 ms	Bidirectional	WebSocket + CRDT/OT	Google Docs, Figma
Multiplayer games	< 50 ms	Bidirectional	WebSocket / UDP	Fortnite, CS2
Live location	< 2 s	Bidirectional	WebSocket	Uber, Yandex Taxi
Stock trading	< 1 ms	Bidirectional	Custom TCP / FPGA	NASDAQ, exchanges
Live streaming	< 5 s	Server → Client	WebRTC / HLS	Twitch, YouTube Live

Notice the **Direction** column. Not all use cases require a bidirectional channel. Notifications and dashboards are **server → client** only. For these, SSE is simpler and sufficient. WebSocket is needed only when the client actively sends data too.

Each use case imposes different requirements not only on latency but also on **delivery guarantees**:

At-most-once — Typing indicators, cursor positions. Losing one event is imperceptible - the next update will correct it.
At-least-once — Notifications, feed events. Better to show a duplicate than to lose an important notification.
Exactly-once — Payments, chat messages. Duplicates are a problem (double charge). Requires idempotency and acknowledgements.

**Start with the simplest solution.** SSE covers 80% of tasks (notifications, feeds, dashboards). WebSocket covers the remaining 20% (chat, games, collaborative editing). Custom UDP - isolated cases (HFT, FPS shooters).

For a live dashboard that updates charts once per second, the best fit is:

Key Lesson Ideas

Real-time - the server pushes data to the client without waiting for a request
Short polling - simple but wasteful: 99% of requests return nothing
Long polling - a compromise, but every client holds a connection open
WebSocket - full-duplex channel, the standard for bidirectional real-time communication
SSE - a simple unidirectional stream, ideal for notifications and dashboards
Latency budget - breaking the allowed delay into components (network, server, client)
Always design around P99, not the average - the tail of the distribution kills UX

What's next

This lesson covered why real-time is needed and what problems it solves. The next step is to explore the specific protocols and how they work internally.

WebSocket protocol — The main bidirectional real-time protocol
Server-Sent Events — Unidirectional server push
Pub/Sub pattern — Scaling real-time across multiple servers

Вопросы для размышления

What real-time features do popular apps (messengers, live dashboards, collaborative editors) have? What technology is each likely using?
When adding real-time notifications to an existing REST API, which approach fits best and why?
How does the latency budget change when users are distributed across multiple continents?

Связанные уроки

rt-02-http-limits — HTTP limitations discussed next explain why real-time architectures were invented
bt-01-overview — Real-time protocols are a specialization of the transport overview covered in backend-transport
st-01-feedback-loops — WebSocket creates a closed feedback loop; polling is an open loop with high latency
alg-01-big-o — Push O(1) vs polling O(n) is a direct application of complexity analysis to protocol choice
sd-01-intro — Real-time requirements appear in System Design estimation: QPS, connection counts, fan-out
net-21-http-basics
net-63-realtime-compare