Real-Time Backend
Horizontal Scaling Patterns
Twitch streamed the League of Legends World Championship to 8 million concurrent viewers. How do you scale WebSocket infrastructure to those numbers without rewriting the code every time load doubles?
- Twitch scaled to 8 million concurrent viewers via stateless edge servers. Each node is independent and one failure does not affect the rest
- Discord runs a stateless gateway fleet (Elixir) plus Cassandra for state, which lets them add gateway servers without coordination between them
- Slack uses Kubernetes HPA to scale pods automatically through load peaks (Monday morning, big company announcements)
- The Socket.io cluster adapter on Redis Pub/Sub lets you run an arbitrary number of pods. An emit to a room reaches every client regardless of which pod they sit on
Shared Nothing
In a **Shared Nothing** architecture each cluster node serves only the clients it accepted and knows nothing about its neighbors. No shared memory, no shared disk. This lets you add nodes horizontally without coordination and without latency growth as you scale.
Discord uses exactly this model for its stateless gateway servers. Each gateway holds WebSocket connections and reads state from Cassandra/Redis, but no gateway talks directly to another. A single node failing does not affect the rest.
- Each node is independent: no shared memory, no shared disk
- Client state lives in an external layer (Redis, Cassandra, PostgreSQL)
- Any request can be routed to any node (load balancer without sticky sessions)
- Scaling means adding a node and registering it with the load balancer
Twitch scaled to 8 million concurrent viewers using exactly this shared-nothing gateway pattern: each edge server accepts an RTMP stream or HLS session autonomously, with no coordination between edge nodes.
Why does Shared Nothing allow you to add nodes without performance degradation?
Shared State
When different nodes need to know each other's state (for example, which user is connected to which server), you need **Shared State**: external storage available to every node at the same time.
Socket.io solves this with a **cluster adapter**: every pod subscribes to a single Redis Pub/Sub channel. When a client on pod A sends an event to a room, pod A publishes it to Redis, Redis fans it out to every pod, and each pod delivers the message to its own clients.
- Redis Pub/Sub: the simplest option, events fan out to every subscriber
- Redis Hash: stores the `userId -> nodeId` mapping for targeted delivery
- Cassandra: for persistent state (message history, profiles)
- Shared state creates a single point of failure: the store itself must be clustered
Slack scales pods horizontally on Kubernetes with a shared Redis for presence state (who is online and in which channel). Kubernetes HPA adds pods automatically as CPU/memory grow, with no code changes.
Socket.io cluster adapter uses Redis to...
Gossip Protocol
**Gossip Protocol** is a decentralized way to spread cluster state. Each node periodically picks random neighbors and exchanges its view of the cluster with them. After a few rounds every node converges to a single state, with no central coordinator.
Cassandra uses gossip to spread topology information: who is alive, who is down, which node owns which key range. Every second each node gossips with 1-3 neighbors. Information spreads in O(log N) rounds.
- No single point of failure: any node can die without breaking the system
- Eventual consistency: information converges in O(log N) rounds, not instantly
- Linear scaling: adding a node does not raise load on existing ones
- Used in: Cassandra, DynamoDB, Riak, Consul, Kubernetes
Gossip is not just for failure detection. Cassandra also uses gossip to spread keyspace schema, token ownership, and load info. Without gossip, every node would need manual configuration on every cluster change.
How many rounds does a gossip protocol need to spread information across N nodes?
Scaling patterns in real systems
In practice, horizontal scaling of realtime systems combines three patterns: stateless gateway (shared nothing), a shared broker for event coordination, and consistent hashing for even load distribution between nodes.
- **Stateless gateway**: WebSocket servers hold no state, so a client can reconnect to any node
- **Shared broker** (Redis Pub/Sub, Kafka, RabbitMQ): events route through a central bus between every gateway node
- **Consistent hashing ring**: new connections and tasks distribute evenly; adding a node reassigns only 1/N of the load
- **Kubernetes HPA**: pods are added or removed automatically based on CPU/memory metrics. Slack uses this approach to scale through activity peaks
Discord handles more than 4 million concurrent voice connections by combining a stateless Voice Gateway (Elixir), Cassandra for state, and consistent hashing for routing voice channels across regional servers.
Horizontal scaling solves any performance problem. Just add servers
Horizontal scaling only works for stateless or properly designed stateful components. The bottleneck may be the shared broker, the database, or the network. Adding gateway nodes then changes nothing.
Amdahl's Law: if 20% of the system does not scale horizontally, the maximum speedup is bounded at 5x no matter how many nodes you add. Before scaling, find the actual bottleneck via profiling and metrics.
When you add a new node to a consistent hashing ring with N existing nodes, how much traffic reshuffles?
Key takeaways
- **Shared Nothing**: independent nodes, state in external storage, any load balancer with no sticky sessions
- **Shared State** (Redis Pub/Sub, Cassandra): event coordination between nodes via an external bus, not direct node-to-node chatter
- **Gossip Protocol**: decentralized topology spread in O(log N) rounds, with no single point of failure
- **Consistent Hashing**: adding a node reassigns only 1/(N+1) of the load, not all of it
Related topics
Horizontal scaling rests on a few core distributed systems patterns:
- Consistent Hashing — The algorithm for spreading load across ring nodes, the foundation of stateless routing
- WebSocket clustering — The practical implementation of shared-nothing gateways for realtime connections
- CAP Theorem — The theoretical basis for consistency vs availability trade-offs during horizontal scaling
Вопросы для размышления
- Your WebSocket server keeps room membership in process memory. What do you need to change to run 10 instances without losing functionality?
- Gossip provides eventual consistency: cluster state converges over a few seconds. Which realtime scenarios tolerate that, and which find it unacceptable?
- Adding a node to a consistent hashing ring forces some connections to migrate. How do you implement the migration smoothly without dropping client WebSocket connections?