Real-Time Backend

Infrastructure

A WebSocket service is deployed to K8s. It runs for 15 minutes, then every connection drops. Metrics look fine. nginx Ingress with default settings kills idle connections after 60 seconds, and that is just the first of five configuration traps.

Twitch moved from sticky sessions to stateless WebSocket servers in 2019; deploys went from 30+ minutes to a few seconds by moving state into Redis
Discord migrated from Elixir to Go and cut memory use by 3x at the same 26 million concurrent connections; the runtime choice is critical at this scale
AWS ALB supports WebSocket natively but with a default idle timeout of 60s; without a heartbeat every 30s or a timeout raised to 3600s, connections drop on quiet users

Kubernetes and WebSocket: why the default config does not work

A standard K8s Service with round-robin breaks WebSocket quietly. The client sends an HTTP Upgrade request; one pod handles it and sets up the connection. The next HTTP request from the same client (health check, REST API) lands on a different pod. The WebSocket connection is already bound to the first one. That is not a disaster for the connection itself, but it is a problem if you need to keep the client pinned to a specific pod.

A more serious problem: HTTP/1.1 Upgrade is a special request. Not every Ingress controller proxies it correctly by default. nginx needs `proxy_read_timeout` raised from 60s to several minutes, otherwise it closes idle connections.

AWS ALB and Google Cloud Load Balancer support WebSocket natively, no special annotations required. But you still need to raise the timeout: ALB closes idle connections after 60 seconds by default. For chats where the user may stay silent for hours, set 3600s or heartbeat every 30s.

Why does nginx Ingress close WebSocket connections after 60 seconds by default?

Sticky Sessions: when pod pinning is required

WebSocket itself does not require sticky sessions: the connection is established once and lives. Sticky is needed when the client makes REST requests in parallel with the WebSocket and those REST requests must hit the same pod (for example, in-memory state on the pod).

Good architecture avoids sticky sessions. Move state out of pod memory and into Redis. Then any pod can serve any client request. This unlocks unrestricted scaling and deployments.

Twitch moved from sticky sessions to stateless WebSocket servers in 2019. Before that, a deploy required draining 100K connections per pod, 30+ minutes. After: any pod is replaceable instantly, the client reconnects to any new pod and pulls state from Redis.

When are sticky sessions necessary for a WebSocket service?

Service Mesh: Istio and Linkerd for WebSocket

A service mesh adds a sidecar proxy to every pod. All traffic goes through the proxy, which provides mTLS between services, circuit breaker, retry, rate limiting, and tracing without code changes. For WebSocket this works, but with caveats.

Istio and Linkerd recognize HTTP/1.1 Upgrade by default and handle WebSocket correctly. But retry and circuit breaker operate at the connection level, not the message level. If the connection drops, the sidecar can attempt to reconnect. If a specific message is lost, the sidecar has no way to know.

Linkerd 2.x has a simpler WebSocket configuration than Istio: no explicit VirtualService for timeout. By default Linkerd does not apply a timeout to connections with HTTP Upgrade. That makes Linkerd appealing for WebSocket-heavy systems.

How does a service mesh (Istio/Linkerd) handle retry for WebSocket?

Scaling: HPA, resource limits, affinity

Horizontal Pod Autoscaler for WebSocket works differently than for HTTP. The CPU metric is not a fit: an idle WebSocket consumes little CPU but holds the connection. HPA should scale on connection count or memory.

Resource limits for WebSocket: each connection consumes ~10-50KB of memory (buffers, state). With 10K connections per pod, that is 100-500MB just for connections. Add business logic on top. Typical request: 512MB-2GB memory, 0.5-2 CPU cores.

At peak load Discord holds 26 million concurrent WebSocket connections on Go-based services. The migration from Elixir to Go in 2020 cut memory usage by 3x: the Go runtime is more efficient for huge numbers of goroutines than Erlang processes at this scale. Details on the Engineering Blog.

A service mesh solves every WebSocket reliability problem automatically, no extra configuration

A service mesh requires an explicit timeout setting for WebSocket (0s instead of the default 15-30s), otherwise it tears down idle connections

Default timeouts in Istio and other service meshes target short-lived HTTP requests. A WebSocket connection runs for hours; without explicit `timeout: 0s` on WebSocket routes the sidecar will close idle connections on timeout.

Why is CPU not a fit as a primary HPA metric for a WebSocket service?

Summary

nginx Ingress requires proxy_read_timeout raised to 3600s and explicit Upgrade/Connection headers for WebSocket
Sticky sessions are a symptom of poor architecture with in-memory state; the right fix is state in Redis
Istio/Linkerd require `timeout: 0s` for WebSocket routes; default timeouts tear down idle connections
HPA scales on connection count, not CPU; scale down should be slow so connections are not torn down

Вопросы для размышления

How do you do zero-downtime HPA scale down when each pod holds thousands of WebSocket connections?
Which to choose for WebSocket in K8s: nginx Ingress, HAProxy Ingress, or AWS ALB Ingress Controller?
How do you set resource limits when connection count per pod swings unpredictably from 100 to 10K?