Real-Time Backend

Low-Latency Live

Viewers leave a stream once the delay versus real time crosses 30 seconds. They move to people posting spoilers in chat. 2 seconds versus 30 seconds is the difference between interactive live and a recording.

  • **Twitch** moved from legacy HLS (20-30s) to LL-HLS in 2022 and latency dropped to 2-5s. For sports a WebRTC path is enabled, so viewers see the action faster than the TV broadcast delay
  • **TikTok LIVE** serves 100M+ concurrent viewers through WebRTC SFUs deployed in every region. Sub-200ms latency is critical for the 'gifts' feature: viewers need to see the streamer's reaction before the next gift
  • **YouTube Live** uses LL-HLS with a 2-second ultra-low-latency mode for live events. Standard mode (30s) turns on aggressive CDN caching and cuts distribution cost by about 3x
  • **Cloudflare Stream** implemented WHIP/WHEP for WebRTC ingest and egress, delivering <500 ms latency from 200+ PoPs without operators managing their own SFU

LL-HLS: partial segments

Classic HLS buffers 6-30 seconds of video into segments, which makes the latency unacceptable for live events. Low-Latency HLS (LL-HLS), standardized by Apple in 2019, slices those segments into **partial segments** of 200-500 ms and ships them to the client before the full segment finishes, via HTTP/2 Push or Blocking Playlist Reload.

Twitch moved to LL-HLS in 2022 - latency dropped from 20-30s to 2-5s while staying CDN-compatible. YouTube Live uses a similar approach and claims 2 seconds for events with ultra-low-latency mode enabled.

Blocking Playlist Reload is the key LL-HLS mechanism: the client sends a request that specifies the expected MSN (media sequence number) and part number. The server answers only when that part is ready. That removes polling and shrinks time-to-first-byte.

0

1

Sign In

What lets LL-HLS reduce latency compared to classic HLS?

WebRTC for streaming: WHIP/WHEP

WebRTC delivers sub-200ms latency over DTLS/SRTP/UDP. It was originally built for p2p calls, but scaling to thousands of viewers required an SFU (Selective Forwarding Unit), a server that takes one stream from the broadcaster and forwards the right layers to each viewer without transcoding.

In 2022-2023 the industry standardized WHIP (WebRTC-HTTP Ingestion Protocol) for publishing and WHEP (WebRTC-HTTP Egress Protocol) for viewing. Cloudflare Stream, Dolby.io, and Mux implemented both, which simplified integration. TikTok LIVE runs a WebRTC stack with its own SFUs per region, totaling 100M+ concurrent viewers.

SFU vs MCU: SFU forwards packets without processing (low CPU, scales to 100k viewers); MCU mixes streams on the server (high CPU, needed only when viewers see a composited feed). For streaming it is always SFU.

Why is SFU preferred over MCU for WebRTC streaming with a large audience?

Sub-second delivery: stack and trade-offs

Sub-second latency (< 1s) is reachable through several paths, each with its own trade-offs. WebRTC gives 50-200 ms, but scaling is expensive and complex. LL-HLS gives 2-5s with CDN compatibility and cheap delivery. RTMP (legacy) gives 3-10s and is still used for ingest (OBS -> server).

  • **WebRTC/WHIP**: 50-200 ms, needs an SFU, hard to cache on a CDN, expensive to scale beyond 100k
  • **LL-HLS**: 2-5s, CDN-native, ABR out of the box, broad browser support
  • **LHLS (Low-Latency HLS, Twitch-spec)**: 2-4s, never standardized, obsolete
  • **SRT (Secure Reliable Transport)**: 0.5-1s latency, ideal for ingest over flaky networks
  • **CMAF Chunked Transfer**: 2-4s, DASH/HLS-compatible, used in Netflix live events

Why is RTMP still used for ingest despite its high latency?

Latency optimization: buffers, GOP, and ABR

The main latency sources in a video pipeline: encoder buffer (0.5-2s), packager buffer (1-3s), CDN propagation (50-200 ms), player buffer (1-10s). For LL streaming each one has to be tuned.

**GOP (Group of Pictures)** is the key parameter: a longer GOP compresses better but makes the client wait longer for a keyframe to start decoding. Twitch uses GOP=2s for live (vs 10+s for VOD). A shorter GOP raises the bitrate by about 15-20%.

ABR (Adaptive Bitrate) in LL streaming leans on CTE (Content-Type Encoding) hints and CMCD (Common Media Client Data): the client reports its bandwidth, and the server can prepare the right quality in advance. Cloudflare Stream and Mux implement this to cut rebuffering by 40%.

For minimum latency you should shrink the player buffer to 0

The player buffer must be balanced: too small means rebuffering on network jitter, too large means more latency. The LL-HLS sweet spot is 0.5-1.5s

Networks have jitter, the variability in packet delay. Without a buffer even a small jitter spike causes freezes. The LL-HLS spec recommends PART-HOLD-BACK = 3x PART-TARGET (0.75s for a 250 ms part)

Which encoder parameter has the biggest impact on time-to-first-frame for a new viewer joining a live stream?

Takeaways

  • **LL-HLS** drops latency to 2-5s through partial segments (250 ms) plus Blocking Playlist Reload, a CDN-compatible compromise between sub-second and scalability
  • **WebRTC SFU** delivers <200 ms over DTLS/SRTP/UDP, but needs SFU infrastructure and does not cache on a CDN, justified by interactivity (chat reactions, gifts, e-sports)
  • **GOP length** is the dominant encoder-side latency knob: LL needs GOP <= 2s, and `-tune zerolatency` in FFmpeg removes B-frames and shrinks the encoder buffer

Related topics

Low-latency live builds on several adjacent areas of the real-time backend:

  • WebRTC internals — The underlying transport stack for sub-second streaming: ICE, DTLS, SRTP, SCTP
  • CDN architecture — LL-HLS scales through a CDN, so understanding edge caching is critical to optimization
  • Multiplayer game networking — The same latency-budget principles and trade-offs between delay and quality

Вопросы для размышления

  • In what scenario is LL-HLS the better pick than WebRTC, despite higher latency (2-5s vs 200 ms)?
  • How would the streaming architecture change if CDNs learned to cache WebRTC streams as efficiently as HLS segments?
  • Why is the encoder buffer set to 0.5x the bitrate rather than 2x like normal VOD streaming?

Связанные уроки

  • net-01-intro
Low-Latency Live