Real-Time Backend
Graceful Deployment
A scheduled deploy at 02:00 UTC. 340,000 users see a red indicator and lost messages. HTTP services would have survived unnoticed. WebSocket needs a different approach.
- GitHub, during a botched deploy in 2023, cut 340K WebSocket connections at once. Without graceful shutdown, the reconnect storm took the chat service down for 3 minutes
- Discord deploys chat services with maxUnavailable: 1 and a 10-minute drain. The deploy takes 2-3 hours, but no connection is closed forcibly
- Netflix added sleep 15 to the preStop hook and fully removed the 2-5 second 503 errors that hit every deploy. The Netflix Zuul gateway team documented this on the Engineering Blog
Zero-downtime: why WebSocket is harder than HTTP
**GitHub, 2023.** A scheduled deploy at 02:00 UTC. Engineers pushed a new version to production, old pods started dying. 340,000 WebSocket connections dropped instantly. For HTTP that would be invisible: the client just makes a new request. For WebSocket the client sees a red indicator and lost real-time events.
HTTP is stateless: each request is independent, and a server restart goes unnoticed. WebSocket is stateful: a connection runs for hours and carries accumulated state (subscriptions, cursors, in-flight messages). Killing a pod = killing every connection on it.
Zero-downtime for WebSocket requires three things working together: the client knows how to reconnect with state recovery, the server knows how to do a graceful shutdown (let connections wind down), and the load balancer knows how to drain traffic without breaking existing connections.
Kubernetes gives a pod 30 seconds by default for graceful shutdown (terminationGracePeriodSeconds). For WebSocket services with long-lived connections this should be raised to 60-300 seconds depending on typical session length.
Why is zero-downtime deployment harder for WebSocket than for HTTP?
Connection Draining: let connections finish
Connection draining is the period between 'pod marked for deletion' and 'pod destroyed'. During this window the pod stops accepting new connections, but existing ones keep working. The load balancer marks the pod as draining and stops sending it new traffic, without cutting off the old.
The problem: Kubernetes sends SIGTERM to the pod and in parallel starts removing it from endpoints. That is not atomic. The load balancer can keep sending traffic for several seconds after SIGTERM. That is why a preStop hook with sleep is needed: it gives the load balancer time to sync state before the pod starts rejecting connections.
Netflix discovered in 2019 that without preStop sleep, deploys produced 503 errors for 2-5 seconds. Adding `sleep 15` to preStop eliminated the problem entirely. This is standard practice for every HTTP and WebSocket service in K8s.
Why is a preStop hook with sleep needed in Kubernetes for WebSocket services?
Rolling Updates for WebSocket: maxUnavailable and surge
A rolling update replaces pods gradually: 1 old pod is removed, 1 new one starts, passes the healthcheck, then the next old one goes. For stateless HTTP this is ideal. For WebSocket there are nuances.
When a pod is removed, all its connections reconnect to the remaining pods. If 50% of the pods are deployed at once, load on the rest doubles during the reconnect storm. With 100K connections on a cluster and 50% maxUnavailable, the remaining 50% of pods receive 100K simultaneous reconnects.
Discord deploys chat services with maxUnavailable: 1 and a 10-minute drain timeout. With 100+ pods the deploy takes 2-3 hours, but no connection is closed forcibly. For mission-critical real-time services a slow deploy is a feature, not a bug.
Why is maxUnavailable: 1 recommended for WebSocket services instead of the default 25%?
Blue-Green Deployment for WebSocket
Blue-green runs a brand new version (green) alongside the old one (blue). When green is ready, traffic switches in one go. For HTTP this is instant. For WebSocket it is not: existing connections on blue must either be allowed to drain or be migrated.
Two options: **hard cutover** (switch everything, old connections break, clients reconnect to green) and **soft migration** (new connections go to green, old ones on blue keep working, blue shuts down after drain).
Blue-green downside: double the resources during deploy. With 100 pods you need 200. For small clusters this may be impossible. The alternative is canary: route 5% of new connections to green, verify correctness, then ramp up.
Twitch uses a combination: blue-green at the regional level (switch a whole region) and rolling updates within a region. This enables fast rollback (switch the region's DNS back) without a reconnect storm across the entire cluster.
Blue-green deployment instantly switches every WebSocket connection without loss
Blue-green switches only the routing of new connections; existing connections must either wait for drain or reconnect forcibly
Unlike HTTP, where each request is independent, a WebSocket connection is a long-lived stateful session. Switching DNS or the load balancer does not move existing TCP connections from one pod to another.
What is the difference between soft migration and hard cutover in blue-green for WebSocket?
Summary
- Graceful shutdown: first `server.close()` for new connections, then a close frame to existing ones (code 1001), then wait, then force-close
- preStop sleep 15s in K8s compensates for the race between SIGTERM and load balancer updates
- maxUnavailable: 1 instead of the default 25% prevents a reconnect storm when removing a group of pods
- Blue-green soft migration: new connections go to green, blue keeps running until natural drain
Related topics
Graceful deployment sits at the intersection of several DevOps areas
- Kubernetes Deployment strategies — Rolling update, maxUnavailable, terminationGracePeriodSeconds are all K8s primitives
- Load Balancer connection draining — AWS ALB, GCP Cloud Load Balancing, nginx upstream - each has its own drain timeout
- WebSocket reconnection logic — Graceful shutdown only works if the client knows how to reconnect with exponential backoff
- Feature Flags — Canary deployment via feature flags is an alternative to blue-green without doubling resources
Вопросы для размышления
- How does the deployment strategy change if there are in-flight messages in a queue that must not be lost?
- With a 120s drain and 1000 active users per pod, how long does a 20-pod deploy take with maxUnavailable: 1?
- How do you set up blue-green when blue and green use different, incompatible DB schemas?