Real-Time Backend

Design: Live Sports

At the moment of the winning goal in a World Cup final, the infrastructure must deliver the event to 500M screens within half a second. Not in a minute. Not in 10 seconds. In 500 milliseconds.

ESPN+ held 22M concurrent connections during the 2023 NBA Finals, more than the population of Australia
Akamai absorbed peak load of 60 Tbps during the World Cup, comparable to all of US internet traffic in 2005
Twitter recorded 7,196 tweets per second at the moment of the 2014 World Cup final goal. Sports peaks are sharper than any other event
Super Bowl 2023: 120M viewers, peak load on betting servers grew 40 times in 3 seconds at the moment of a touchdown

Live broadcast architecture

The 2022 World Cup final reached a peak audience of 1.5 billion people. The ESPN+ infrastructure held 22 million concurrent connections during the 2023 NBA Finals. Loads like that need a multi-layer architecture. A single monolith with a WebSocket server would collapse in the first seconds.

Base layers of the system

**Ingest Service** receives events from official Stats APIs (Stats Perform, Sportradar) through HTTP polling every 200 to 500 ms or via their push webhooks, then publishes normalized events into Kafka. Each sport is its own topic (`events.football`, `events.basketball`), which lets the processing scale independently.

**Event Processor** is a Kafka consumer that enriches the raw event: it adds team metadata, updates the score, computes statistics. Processed events are sent to the pub/sub bus and at the same time written to Redis for caching the current match state.

Two transports for clients: **SSE** (Server-Sent Events) for browsers is easier to operate and has auto-reconnect built in. **WebSocket** is for mobile apps and cases that need a two-way channel (chat, betting). ESPN runs both in parallel.

The `seq` field is critical: on reconnect the client sends `Last-Event-ID: <seq>` and only receives the missed events, not a full match replay. This cuts load during mass reconnects (for example, after a commercial break).

Why does each sport get its own Kafka topic instead of a single shared one?

Fan-out to millions of subscribers

When Cristiano Ronaldo scores at minute 89 of the final, 50 million connected clients need to be notified within 500 ms. A simple loop over WebSocket connections in one process is O(N) on a single thread. For 50M users that takes minutes, while the delay must stay under 1 second.

Hierarchical fan-out

The answer is **multi-level fan-out**. A single source publishes the event to a pub/sub bus (Redis Pub/Sub or a custom Kafka-based implementation). Thousands of edge servers around the world subscribe to the bus. Each edge server then pushes the event to its own 20k to 50k connected clients.

Layer	Node count	Clients per node	Latency
Event Processor	10 to 20	-	0 ms (source)
Regional Fan-out	100 to 200	-	10 to 50 ms
Edge WebSocket	2000 to 5000	10k to 50k	50 to 200 ms
Client	50M+	-	endpoint

Each edge node holds a Map `matchId -> Set<WebSocket>` in memory. On connect, the client registers in the right Map by the matchId in the URL. On a goal the event travels through Kafka -> Event Processor -> Redis Pub/Sub -> every edge node -> every client of each node.

Twitter recorded 7,196 tweets per second at the moment of a goal in the 2014 World Cup final. Real-time platforms see even sharper peaks: a single moment generates a wave of requests, not smeared traffic. That is why fan-out is built on push, not on client polling.

Horizontal scaling of edge nodes

Edge nodes are stateful by nature: a WebSocket connection is bound to a specific process. The load balancer routes new connections by current load (`least-connections` algorithm), not round-robin. When nodes are added to a match with 5M viewers, new clients automatically go to the new nodes.

**Sticky sessions are not needed**: each client holds one long-lived connection to one node
**Graceful shutdown**: when a node is taken out of rotation, clients receive a Close frame with code 1001 and reconnect to other nodes
**Health checks** every 10 seconds: check not only HTTP 200 but also the number of active connections

Why use Redis Pub/Sub to broadcast events from Event Processor to edge nodes instead of direct HTTP calls to every edge node?

Handling traffic bursts

Super Bowl 2023 drew 120M viewers. At the moment of a fourth-quarter touchdown the request rate at ESPN servers grew 40 times in 3 seconds. Standard horizontal scaling cannot keep up. A new EC2 instance comes up in 2 to 4 minutes, but the peak is over in 30 seconds.

The nature of sports burst patterns

Sports events have a predictable traffic lifecycle: **pre-game buildup** (rising growth 30 to 60 minutes before kickoff), **game spikes** (sharp peaks on goals/touchdowns, 10 to 60 seconds long), **halftime plateau** (moderate steady traffic), **end-game rush** (the biggest peak, often 2 to 3 times higher than game spikes).

**Pre-warming**: Akamai and Cloudfront start warming edge caches 2 hours before the match, on schedule
**Reserved capacity**: AWS/GCP Reserved Instances are pre-reserved for known major events
**Load shedding**: under overload, non-critical requests (last season's stats) are dropped with HTTP 503
**Backpressure**: Kafka queues buffer events and prevent a cascading failure when downstream is overloaded

Akamai and the CDN as the first line of defense

Akamai serves peak loads above 60 Tbps, which is the combined traffic of several large matches at once. The CDN absorbs the hit: static assets (the match page, team icons, CSS) are served from edge nodes without hitting the origin. Only dynamic events (live score, comments) reach the backend.

**Graceful degradation** under overload: if the WebSocket server is overloaded, the client automatically falls back to SSE (less overhead). If SSE is also overloaded, it falls back to Long Polling at a 5 second interval. The user sees the score with a 5 second delay instead of 0.5 seconds, but does not see an error.

Vertical scaling — Increase RAM/CPU on the instance. Fast, but with a hard ceiling. Does not save you from a 40x burst in 3 seconds.
Horizontal scaling — Add nodes. Works for predictable loads. Auto-scaling reacts in 2 to 4 minutes, too slow for sports peaks.
Pre-warming + CDN — Infrastructure is ready BEFORE the peak, on schedule. The CDN absorbs static. The only approach that works for live sports.

Why are Auto Scaling Groups (AWS) not enough to handle a burst when a World Cup goal is scored?

Live data caching strategy

Live sports create a caching paradox: the data changes every few seconds, but the cache is what saves the system from a crash. The naive approach (TTL 0, no cache) means 50M clients hammer the database directly on every update. The right approach is a multi-layer cache with different TTLs for different data types.

Classifying data by volatility

Data type	Update frequency	Cache TTL	Storage
Current score	Every goal (~5 to 10 min)	5 sec	Redis
Team roster	Before the match	3600 sec (1 hour)	Redis + CDN
Live event feed	Every 30 to 60 sec	2 to 3 sec	Redis
Historical matches	Never	86400 sec (1 day)	CDN edge
Player stats	After the match	600 sec (10 min)	Redis + CDN

**Write-through cache**: Event Processor writes the event to Redis and Kafka at the same time. Redis holds the current match state as a JSON object under the key `match:{matchId}:state`. A 5 second TTL guarantees that a client on reconnect gets a fresh score without hitting PostgreSQL.

Cache stampede and how to prevent it

**Cache stampede** is the situation where a cache TTL expires at the same moment for 50M clients and they all hit the DB at once. For a match with 50M viewers that is 50M queries in a fraction of a second, a guaranteed database failure.

**Probabilistic early expiration** (XFetch algorithm): each client refreshes the cache early with probability p while others still read fresh data
**Mutex on refresh**: only one process gets the lock to refresh the cache, the others wait or serve stale data
**Jitter in TTL**: instead of `setex(key, 30, ...)` use `setex(key, 28 + random(4), ...)`. A 4 second spread smears cache expirations over time

ESPN uses a **stale-while-revalidate** strategy: the client gets data from the cache instantly (even if the TTL expired), and a background worker refreshes the cache. The score can be 1 to 2 seconds stale, but the user does not see a loading delay. It is a trade-off between consistency and availability.

The lower the cache TTL, the more 'live' the broadcast. True real-time needs TTL 0.

TTL 0 (no cache) kills the system at peak load. The real-time feel comes from push notifications via WebSocket/SSE, not from polling frequency. The cache exists to absorb stampedes and to restore state on reconnect.

WebSocket with a 5 second Redis TTL delivers the event to the user 100 to 500 ms after a goal. No cache with 50M concurrent users takes the database down in seconds. The user sees nothing.

Why is the current match score cached in Redis with a 5 second TTL instead of 0 (no cache)?

Takeaways

**Multi-layer pipeline**: Sportradar -> Kafka -> Event Processor -> Redis Pub/Sub -> Edge nodes -> clients. Each layer scales horizontally on its own
**Hierarchical fan-out**: the event is published once to Pub/Sub, each of thousands of edge nodes pushes to its own clients. O(1) on the publisher side
**Pre-warming instead of reactive scaling**: Auto Scaling reacts in 2 to 4 minutes, a sports peak lasts 10 to 60 seconds. Capacity is reserved ahead of time by match schedule
**Write-through cache with TTL jitter**: Redis holds the current match state, TTL spread prevents a DB stampede when 50M clients have caches expiring at once

Вопросы для размышления

Which component of the architecture is a single point of failure (SPOF) and how do you remove it?
How does the fan-out architecture change if you have to support personalized events (for example, notifying only fans of the team that scored)?
On graceful degradation the system falls back from WebSocket to Long Polling at 5 second intervals. How does that affect server load and how do you compensate for the growth in requests?

Связанные уроки

sd-01-intro

Real-Time Backend

Design: Live Sports

At the moment of the winning goal in a World Cup final, the infrastructure must deliver the event to 500M screens within half a second. Not in a minute. Not in 10 seconds. In 500 milliseconds.

ESPN+ held 22M concurrent connections during the 2023 NBA Finals, more than the population of Australia
Akamai absorbed peak load of 60 Tbps during the World Cup, comparable to all of US internet traffic in 2005
Twitter recorded 7,196 tweets per second at the moment of the 2014 World Cup final goal. Sports peaks are sharper than any other event
Super Bowl 2023: 120M viewers, peak load on betting servers grew 40 times in 3 seconds at the moment of a touchdown

Live broadcast architecture

Base layers of the system

Why does each sport get its own Kafka topic instead of a single shared one?

Fan-out to millions of subscribers

Hierarchical fan-out

Layer	Node count	Clients per node	Latency
Event Processor	10 to 20	-	0 ms (source)
Regional Fan-out	100 to 200	-	10 to 50 ms
Edge WebSocket	2000 to 5000	10k to 50k	50 to 200 ms
Client	50M+	-	endpoint

Horizontal scaling of edge nodes

**Sticky sessions are not needed**: each client holds one long-lived connection to one node
**Graceful shutdown**: when a node is taken out of rotation, clients receive a Close frame with code 1001 and reconnect to other nodes
**Health checks** every 10 seconds: check not only HTTP 200 but also the number of active connections

Why use Redis Pub/Sub to broadcast events from Event Processor to edge nodes instead of direct HTTP calls to every edge node?

Handling traffic bursts

The nature of sports burst patterns

**Pre-warming**: Akamai and Cloudfront start warming edge caches 2 hours before the match, on schedule
**Reserved capacity**: AWS/GCP Reserved Instances are pre-reserved for known major events
**Load shedding**: under overload, non-critical requests (last season's stats) are dropped with HTTP 503
**Backpressure**: Kafka queues buffer events and prevent a cascading failure when downstream is overloaded

Akamai and the CDN as the first line of defense

Vertical scaling — Increase RAM/CPU on the instance. Fast, but with a hard ceiling. Does not save you from a 40x burst in 3 seconds.
Horizontal scaling — Add nodes. Works for predictable loads. Auto-scaling reacts in 2 to 4 minutes, too slow for sports peaks.
Pre-warming + CDN — Infrastructure is ready BEFORE the peak, on schedule. The CDN absorbs static. The only approach that works for live sports.

Why are Auto Scaling Groups (AWS) not enough to handle a burst when a World Cup goal is scored?

Live data caching strategy

Classifying data by volatility

Data type	Update frequency	Cache TTL	Storage
Current score	Every goal (~5 to 10 min)	5 sec	Redis
Team roster	Before the match	3600 sec (1 hour)	Redis + CDN
Live event feed	Every 30 to 60 sec	2 to 3 sec	Redis
Historical matches	Never	86400 sec (1 day)	CDN edge
Player stats	After the match	600 sec (10 min)	Redis + CDN

Cache stampede and how to prevent it

**Probabilistic early expiration** (XFetch algorithm): each client refreshes the cache early with probability p while others still read fresh data
**Mutex on refresh**: only one process gets the lock to refresh the cache, the others wait or serve stale data
**Jitter in TTL**: instead of `setex(key, 30, ...)` use `setex(key, 28 + random(4), ...)`. A 4 second spread smears cache expirations over time

The lower the cache TTL, the more 'live' the broadcast. True real-time needs TTL 0.

WebSocket with a 5 second Redis TTL delivers the event to the user 100 to 500 ms after a goal. No cache with 50M concurrent users takes the database down in seconds. The user sees nothing.

Why is the current match score cached in Redis with a 5 second TTL instead of 0 (no cache)?

Takeaways

**Multi-layer pipeline**: Sportradar -> Kafka -> Event Processor -> Redis Pub/Sub -> Edge nodes -> clients. Each layer scales horizontally on its own
**Hierarchical fan-out**: the event is published once to Pub/Sub, each of thousands of edge nodes pushes to its own clients. O(1) on the publisher side
**Pre-warming instead of reactive scaling**: Auto Scaling reacts in 2 to 4 minutes, a sports peak lasts 10 to 60 seconds. Capacity is reserved ahead of time by match schedule
**Write-through cache with TTL jitter**: Redis holds the current match state, TTL spread prevents a DB stampede when 50M clients have caches expiring at once

Вопросы для размышления

Which component of the architecture is a single point of failure (SPOF) and how do you remove it?
How does the fan-out architecture change if you have to support personalized events (for example, notifying only fans of the team that scored)?
On graceful degradation the system falls back from WebSocket to Long Polling at 5 second intervals. How does that affect server load and how do you compensate for the growth in requests?

Связанные уроки

sd-01-intro

Design: Live Sports

Live broadcast architecture

Base layers of the system

Fan-out to millions of subscribers

Hierarchical fan-out

Horizontal scaling of edge nodes

Handling traffic bursts

The nature of sports burst patterns

Akamai and the CDN as the first line of defense

Live data caching strategy

Classifying data by volatility

Cache stampede and how to prevent it

Takeaways

Related topics

Вопросы для размышления

Связанные уроки

Design: Live Sports

Live broadcast architecture

Base layers of the system

Fan-out to millions of subscribers

Hierarchical fan-out

Horizontal scaling of edge nodes

Handling traffic bursts

The nature of sports burst patterns

Akamai and the CDN as the first line of defense

Live data caching strategy

Classifying data by volatility

Cache stampede and how to prevent it

Takeaways

Related topics

Вопросы для размышления

Связанные уроки