Real-Time Backend

Load Balancing WebSocket

Slack hit a wall in 2015: every deploy of a new version dropped every WebSocket connection at once. Thousands of clients reconnected in the same second, knocking over the freshly started servers. The problem was not in the code, but in how load balancing handles long-lived connections.

Discord serves more than 7 million concurrent WebSocket connections. They use multiple balancing tiers: L4 at the anycast IP level and L7 with least_conn inside the datacenter
Twitch chat switched from round-robin to ip_hash after an incident: at peak viewership of a popular stream, reconnects produced a thundering herd on a single backend
AWS ALB added WebSocket support in 2016 specifically for services like Slack and Zendesk. Before that they had to use Classic LB or HAProxy directly
The Figma team described how a misconfigured proxy_read_timeout in nginx kept dropping collaboration connections every N minutes for users behind corporate proxies with long idle timeouts

L4 vs L7 for WebSocket

A regular HTTP request lives for milliseconds. The load balancer picks a backend for each request independently. A WebSocket connection stays open for hours: the client does an HTTP Upgrade once, and all subsequent traffic flows over the same TCP channel. That fundamentally changes load balancer requirements.

**An L4 load balancer** works at the TCP layer. It only sees IP addresses and ports and does not read HTTP headers. It hands the TCP connection to a backend and then stays out of the way. That is ideal for WebSocket: the upgrade is transparent and the long connection stays intact. The downside: no routing by URL, headers, or cookies.

**An L7 load balancer** (nginx, HAProxy, AWS ALB) reads HTTP. It can route by path (`/ws` vs `/api`), add headers, and terminate TLS. WebSocket requires explicit configuration. You must forward the `Upgrade` and `Connection` headers, otherwise the balancer will not let the connection transition into raw TCP mode.

Without `proxy_http_version 1.1`, nginx defaults to HTTP/1.0, which does not support Upgrade. The connection silently degrades to plain HTTP or breaks during handshake.

An L7 load balancer receives a WebSocket Upgrade request, but the `Upgrade` and `Connection` headers are not forwarded. What happens?

Sticky sessions and balancing algorithms

Round-robin distributes connections in turn: client 1 to backend A, client 2 to backend B, client 3 to backend A. For stateless HTTP this is perfect. For WebSocket it is a problem: if the client reconnects, the new backend knows nothing about its state (rooms, subscriptions, message buffer).

**ip_hash** in nginx solves this simply: it hashes the client IP, divides by the number of backends, and the client always lands on the same server. The catch: thousands of clients sit behind a NAT with one IP and all flow to a single backend.

**least_conn** is often the better pick for WebSocket compared to ip_hash: new connections go to the backend with the lowest load. A client that reconnects 100 times is not guaranteed to land on the same server, but load is shared fairly. Paired with an external state store (Redis pub/sub), sticky sessions are not required.

**ip_hash**: simple stickiness, weak behind NAT/CDN
**least_conn**: the best choice when Redis is available for state sharing
**round_robin** (default): only fits stateless backends
**HAProxy `balance source`**: equivalent to ip_hash, supports `hash-type consistent` (consistent hashing, less remapping when the pool changes)

A chat service keeps the list of online room members in each Node.js process's memory. The load balancer uses round-robin. Alice reconnects and lands on a different backend. What happens?

Health checks and connection draining

The load balancer must know which backends are alive. For HTTP this is simple: ping `/health` every 5 seconds, and remove the backend from the pool if the response is not 200. WebSocket has a twist: the backend may respond to the HTTP health check yet stop accepting new WS connections due to memory pressure or file descriptor exhaustion.

**Connection draining** (or graceful shutdown) is critical during deploys. When a backend leaves the pool, you cannot just kill the process: it may hold thousands of active WebSocket connections. The right flow: the backend signals that it will not accept new connections, waits for current ones to close (or forcefully closes them after a timeout), and then exits.

AWS ALB supports WebSocket natively and has `deregistration_delay` (default 300 seconds). That is how long ALB waits before killing a backend after it is removed from the target group. For WS services it is worth raising this value to match typical connection lifetime.

Load balancer receives a signal to remove the backend (deploy or failure)
Health check starts returning 503: no new connections are routed in
Existing WS connections continue to work
After the draining timeout the backend closes connections with code 1001
The process exits and clients reconnect to other backends

During a deploy, the new server process is killed with SIGKILL right away. 500 clients have their connections cut without a close code. What happens on the client side?

Production config: nginx + AWS ALB

Real WebSocket deployments combine several layers. AWS ALB terminates TLS and routes `/ws` to a target group of EC2 instances or ECS containers. ALB has supported WebSocket natively since 2016. You just need to make sure the listener rule forwards the Upgrade headers.

For on-prem setups or when you need nginx-level control: an upstream with `least_conn` plus `keepalive` (reusing TCP connections between nginx and backends). `proxy_read_timeout` must exceed the client heartbeat interval, otherwise nginx will close an idle WS connection.

`proxy_buffering off` matters for WebSocket: nginx buffers backend responses in memory by default. For long-lived connections that is wasted RAM and added latency.

WebSocket cannot be balanced through an L7 load balancer. You need an L4 (TCP-level) one

L7 load balancers (nginx, HAProxy, AWS ALB) handle WebSocket just fine with the right config: forward Upgrade/Connection headers and set an adequate proxy_read_timeout

WebSocket starts as an HTTP request (Upgrade handshake), so an L7 balancer can handle it. After the upgrade the connection becomes bidirectional TCP, which L7 simply proxies transparently. L4 is easier to configure but loses L7 capabilities: URL routing, TLS termination, headers.

nginx is proxying WebSocket. The client sends a ping every 30 seconds. `proxy_read_timeout` is 60s. What happens on a 35-second pause in traffic?

Key takeaways

L4 balances TCP connections transparently. L7 reads HTTP and requires explicit forwarding of `Upgrade` and `Connection` headers for WebSocket
ip_hash gives simple IP stickiness; least_conn is better when there is an external state store (Redis), since the client can land on any backend
Connection draining at deploy time is mandatory: the health check returns 503, the backend waits for connections to close, and only then exits
`proxy_read_timeout` in nginx must exceed the client heartbeat interval. `proxy_buffering off` saves RAM on long-lived connections

Вопросы для размышления

If a service uses Redis to store room state, are sticky sessions still required? What are the trade-offs?
How would you organize a zero-downtime deploy for a WebSocket service where a typical user session lasts 4 hours?
Why is the thundering herd dangerous when thousands of clients reconnect at the same time, and how can it be softened on the client side?