Real-Time Backend
Design: Live Sports
At the moment of the winning goal in a World Cup final, the infrastructure must deliver the event to 500M screens within half a second. Not in a minute. Not in 10 seconds. In 500 milliseconds.
- ESPN+ held 22M concurrent connections during the 2023 NBA Finals, more than the population of Australia
- Akamai absorbed peak load of 60 Tbps during the World Cup, comparable to all of US internet traffic in 2005
- Twitter recorded 7,196 tweets per second at the moment of the 2014 World Cup final goal. Sports peaks are sharper than any other event
- Super Bowl 2023: 120M viewers, peak load on betting servers grew 40 times in 3 seconds at the moment of a touchdown
Live broadcast architecture
The 2022 World Cup final reached a peak audience of 1.5 billion people. The ESPN+ infrastructure held 22 million concurrent connections during the 2023 NBA Finals. Loads like that need a multi-layer architecture. A single monolith with a WebSocket server would collapse in the first seconds.
Base layers of the system
**Ingest Service** receives events from official Stats APIs (Stats Perform, Sportradar) through HTTP polling every 200 to 500 ms or via their push webhooks, then publishes normalized events into Kafka. Each sport is its own topic (`events.football`, `events.basketball`), which lets the processing scale independently.
**Event Processor** is a Kafka consumer that enriches the raw event: it adds team metadata, updates the score, computes statistics. Processed events are sent to the pub/sub bus and at the same time written to Redis for caching the current match state.
Two transports for clients: **SSE** (Server-Sent Events) for browsers is easier to operate and has auto-reconnect built in. **WebSocket** is for mobile apps and cases that need a two-way channel (chat, betting). ESPN runs both in parallel.
The `seq` field is critical: on reconnect the client sends `Last-Event-ID: <seq>` and only receives the missed events, not a full match replay. This cuts load during mass reconnects (for example, after a commercial break).
Why does each sport get its own Kafka topic instead of a single shared one?
Fan-out to millions of subscribers
When Cristiano Ronaldo scores at minute 89 of the final, 50 million connected clients need to be notified within 500 ms. A simple loop over WebSocket connections in one process is O(N) on a single thread. For 50M users that takes minutes, while the delay must stay under 1 second.
Hierarchical fan-out
The answer is **multi-level fan-out**. A single source publishes the event to a pub/sub bus (Redis Pub/Sub or a custom Kafka-based implementation). Thousands of edge servers around the world subscribe to the bus. Each edge server then pushes the event to its own 20k to 50k connected clients.
| Layer | Node count | Clients per node | Latency |
|---|---|---|---|
| Event Processor | 10 to 20 | - | 0 ms (source) |
| Regional Fan-out | 100 to 200 | - | 10 to 50 ms |
| Edge WebSocket | 2000 to 5000 | 10k to 50k | 50 to 200 ms |
| Client | 50M+ | - | endpoint |
Each edge node holds a Map `matchId -> Set<WebSocket>` in memory. On connect, the client registers in the right Map by the matchId in the URL. On a goal the event travels through Kafka -> Event Processor -> Redis Pub/Sub -> every edge node -> every client of each node.
Twitter recorded 7,196 tweets per second at the moment of a goal in the 2014 World Cup final. Real-time platforms see even sharper peaks: a single moment generates a wave of requests, not smeared traffic. That is why fan-out is built on push, not on client polling.
Horizontal scaling of edge nodes
Edge nodes are stateful by nature: a WebSocket connection is bound to a specific process. The load balancer routes new connections by current load (`least-connections` algorithm), not round-robin. When nodes are added to a match with 5M viewers, new clients automatically go to the new nodes.
- **Sticky sessions are not needed**: each client holds one long-lived connection to one node
- **Graceful shutdown**: when a node is taken out of rotation, clients receive a Close frame with code 1001 and reconnect to other nodes
- **Health checks** every 10 seconds: check not only HTTP 200 but also the number of active connections
Why use Redis Pub/Sub to broadcast events from Event Processor to edge nodes instead of direct HTTP calls to every edge node?
Handling traffic bursts
Super Bowl 2023 drew 120M viewers. At the moment of a fourth-quarter touchdown the request rate at ESPN servers grew 40 times in 3 seconds. Standard horizontal scaling cannot keep up. A new EC2 instance comes up in 2 to 4 minutes, but the peak is over in 30 seconds.
The nature of sports burst patterns
Sports events have a predictable traffic lifecycle: **pre-game buildup** (rising growth 30 to 60 minutes before kickoff), **game spikes** (sharp peaks on goals/touchdowns, 10 to 60 seconds long), **halftime plateau** (moderate steady traffic), **end-game rush** (the biggest peak, often 2 to 3 times higher than game spikes).
- **Pre-warming**: Akamai and Cloudfront start warming edge caches 2 hours before the match, on schedule
- **Reserved capacity**: AWS/GCP Reserved Instances are pre-reserved for known major events
- **Load shedding**: under overload, non-critical requests (last season's stats) are dropped with HTTP 503
- **Backpressure**: Kafka queues buffer events and prevent a cascading failure when downstream is overloaded
Akamai and the CDN as the first line of defense
Akamai serves peak loads above 60 Tbps, which is the combined traffic of several large matches at once. The CDN absorbs the hit: static assets (the match page, team icons, CSS) are served from edge nodes without hitting the origin. Only dynamic events (live score, comments) reach the backend.
**Graceful degradation** under overload: if the WebSocket server is overloaded, the client automatically falls back to SSE (less overhead). If SSE is also overloaded, it falls back to Long Polling at a 5 second interval. The user sees the score with a 5 second delay instead of 0.5 seconds, but does not see an error.
- Vertical scaling — Increase RAM/CPU on the instance. Fast, but with a hard ceiling. Does not save you from a 40x burst in 3 seconds.
- Horizontal scaling — Add nodes. Works for predictable loads. Auto-scaling reacts in 2 to 4 minutes, too slow for sports peaks.
- Pre-warming + CDN — Infrastructure is ready BEFORE the peak, on schedule. The CDN absorbs static. The only approach that works for live sports.
Why are Auto Scaling Groups (AWS) not enough to handle a burst when a World Cup goal is scored?
Live data caching strategy
Live sports create a caching paradox: the data changes every few seconds, but the cache is what saves the system from a crash. The naive approach (TTL 0, no cache) means 50M clients hammer the database directly on every update. The right approach is a multi-layer cache with different TTLs for different data types.
Classifying data by volatility
| Data type | Update frequency | Cache TTL | Storage |
|---|---|---|---|
| Current score | Every goal (~5 to 10 min) | 5 sec | Redis |
| Team roster | Before the match | 3600 sec (1 hour) | Redis + CDN |
| Live event feed | Every 30 to 60 sec | 2 to 3 sec | Redis |
| Historical matches | Never | 86400 sec (1 day) | CDN edge |
| Player stats | After the match | 600 sec (10 min) | Redis + CDN |
**Write-through cache**: Event Processor writes the event to Redis and Kafka at the same time. Redis holds the current match state as a JSON object under the key `match:{matchId}:state`. A 5 second TTL guarantees that a client on reconnect gets a fresh score without hitting PostgreSQL.
Cache stampede and how to prevent it
**Cache stampede** is the situation where a cache TTL expires at the same moment for 50M clients and they all hit the DB at once. For a match with 50M viewers that is 50M queries in a fraction of a second, a guaranteed database failure.
- **Probabilistic early expiration** (XFetch algorithm): each client refreshes the cache early with probability p while others still read fresh data
- **Mutex on refresh**: only one process gets the lock to refresh the cache, the others wait or serve stale data
- **Jitter in TTL**: instead of `setex(key, 30, ...)` use `setex(key, 28 + random(4), ...)`. A 4 second spread smears cache expirations over time
ESPN uses a **stale-while-revalidate** strategy: the client gets data from the cache instantly (even if the TTL expired), and a background worker refreshes the cache. The score can be 1 to 2 seconds stale, but the user does not see a loading delay. It is a trade-off between consistency and availability.
The lower the cache TTL, the more 'live' the broadcast. True real-time needs TTL 0.
TTL 0 (no cache) kills the system at peak load. The real-time feel comes from push notifications via WebSocket/SSE, not from polling frequency. The cache exists to absorb stampedes and to restore state on reconnect.
WebSocket with a 5 second Redis TTL delivers the event to the user 100 to 500 ms after a goal. No cache with 50M concurrent users takes the database down in seconds. The user sees nothing.
Why is the current match score cached in Redis with a 5 second TTL instead of 0 (no cache)?
Takeaways
- **Multi-layer pipeline**: Sportradar -> Kafka -> Event Processor -> Redis Pub/Sub -> Edge nodes -> clients. Each layer scales horizontally on its own
- **Hierarchical fan-out**: the event is published once to Pub/Sub, each of thousands of edge nodes pushes to its own clients. O(1) on the publisher side
- **Pre-warming instead of reactive scaling**: Auto Scaling reacts in 2 to 4 minutes, a sports peak lasts 10 to 60 seconds. Capacity is reserved ahead of time by match schedule
- **Write-through cache with TTL jitter**: Redis holds the current match state, TTL spread prevents a DB stampede when 50M clients have caches expiring at once
Related topics
Live sports brings together several distributed-systems patterns:
- WebSocket and SSE — Transport layer for push notifications to clients
- Kafka and event streaming — Backbone for reliable event fan-out between services
- Redis and caching — Current match state and DB protection from stampedes
- CDN and edge computing — Akamai/Cloudfront absorb static and the first wave of burst traffic
Вопросы для размышления
- Which component of the architecture is a single point of failure (SPOF) and how do you remove it?
- How does the fan-out architecture change if you have to support personalized events (for example, notifying only fans of the team that scored)?
- On graceful degradation the system falls back from WebSocket to Long Polling at 5 second intervals. How does that affect server load and how do you compensate for the growth in requests?