Real-Time Backend

Connection Lifecycle

Discord serves 19 million concurrent users. When a node restarts, millions of clients must reconnect within seconds, not hours. That is only possible if the connection lifecycle was designed correctly from the start.

Socket.io uses exponential backoff with a default randomizationFactor of 0.5. That is exactly why reconnection after a failure looks smooth instead of a wave
Discord Gateway sends a heartbeat every 41,250 ms plus random jitter. Without it, millions of clients would synchronize and pound the server at the same time
Kubernetes rolling deploys send SIGTERM to pods 30 seconds before removal. That grace period is exactly what WebSocket servers use for the DRAIN pattern
Slack stores missed events in Redis: a client that reconnects within 2 minutes receives a delta, otherwise it gets a full channel snapshot

Reconnection

A WebSocket connection can live from seconds to days, and any of that time it can break. A mobile client loses Wi-Fi, the load balancer restarts, the server falls over. Without automatic reconnect, the user just sees a stuck UI and leaves. Reconnection is not a nice-to-have. It is a baseline contract for a realtime app.

The WebSocket handshake is a regular HTTP Upgrade: the browser sends `GET /ws` with `Upgrade: websocket` and `Sec-WebSocket-Key` headers, and the server replies with `101 Switching Protocols`. After that the TCP connection switches to bidirectional mode. A break happens when the TCP stack stops receiving ACKs from the other side. That can take anywhere from a second (active RST) to several minutes (keepalive timeout).

The disconnect reason matters. `io server disconnect` means the server intentionally closed the connection (for example, an invalid token). In that case an automatic reconnect is pointless. Show an error to the user or refresh the credentials.

Real-Time Backend

Connection Lifecycle

Socket.io uses exponential backoff with a default randomizationFactor of 0.5. That is exactly why reconnection after a failure looks smooth instead of a wave
Discord Gateway sends a heartbeat every 41,250 ms plus random jitter. Without it, millions of clients would synchronize and pound the server at the same time
Kubernetes rolling deploys send SIGTERM to pods 30 seconds before removal. That grace period is exactly what WebSocket servers use for the DRAIN pattern
Slack stores missed events in Redis: a client that reconnects within 2 minutes receives a delta, otherwise it gets a full channel snapshot

Connection Lifecycle

Reconnection

Connection Lifecycle

Reconnection

Backoff and jitter

State recovery

Graceful disconnect

Key takeaways

Related topics

Вопросы для размышления

Связанные уроки