Real-Time Backend
Low-Latency Live
Viewers leave a stream once the delay versus real time crosses 30 seconds. They move to people posting spoilers in chat. 2 seconds versus 30 seconds is the difference between interactive live and a recording.
- **Twitch** moved from legacy HLS (20-30s) to LL-HLS in 2022 and latency dropped to 2-5s. For sports a WebRTC path is enabled, so viewers see the action faster than the TV broadcast delay
- **TikTok LIVE** serves 100M+ concurrent viewers through WebRTC SFUs deployed in every region. Sub-200ms latency is critical for the 'gifts' feature: viewers need to see the streamer's reaction before the next gift
- **YouTube Live** uses LL-HLS with a 2-second ultra-low-latency mode for live events. Standard mode (30s) turns on aggressive CDN caching and cuts distribution cost by about 3x
- **Cloudflare Stream** implemented WHIP/WHEP for WebRTC ingest and egress, delivering <500 ms latency from 200+ PoPs without operators managing their own SFU
LL-HLS: partial segments
Classic HLS buffers 6-30 seconds of video into segments, which makes the latency unacceptable for live events. Low-Latency HLS (LL-HLS), standardized by Apple in 2019, slices those segments into **partial segments** of 200-500 ms and ships them to the client before the full segment finishes, via HTTP/2 Push or Blocking Playlist Reload.
Twitch moved to LL-HLS in 2022 - latency dropped from 20-30s to 2-5s while staying CDN-compatible. YouTube Live uses a similar approach and claims 2 seconds for events with ultra-low-latency mode enabled.
Blocking Playlist Reload is the key LL-HLS mechanism: the client sends a request that specifies the expected MSN (media sequence number) and part number. The server answers only when that part is ready. That removes polling and shrinks time-to-first-byte.