Real-Time Backend
WebRTC basics
Google Meet for a 2-person call doesn't even touch the video stream on its servers - pixels fly straight from browser to browser. That's WebRTC P2P, and that's how real-time video actually works.
- **Whereby** (video conferencing) builds the entire product on WebRTC P2P for 1-1 calls, switching to SFU only when >2 participants. P2P cuts their media server costs to zero for most calls
- **Twilio Video** handles millions of WebRTC sessions per day. Their stats: 85% of connections via STUN (P2P through NAT), 5% require TURN relay, 10% direct local network
- **Figma** uses WebRTC DataChannel for cursor sharing in multiplayer - P2P data, not media. Faster than through the server on a direct connection
- **Daily.co** publishes open WebRTC stats: average ICE negotiation time is 300-800ms depending on network type and TURN availability
WebRTC concept
WebRTC (Web Real-Time Communication) is a set of APIs and protocols for direct peer-to-peer browser communication without a server relaying the media stream. Google open-sourced WebRTC in 2011. Today it underpins Google Meet, Whereby, Daily.co, and thousands of other video services.
The key difference from ordinary WebSocket or HTTP: data flows **directly between browsers**, bypassing the server. For a 100-participant video conference this means 100x server traffic savings versus a relay architecture. But establishing a direct connection through NAT and firewalls is non-trivial, and that's what ICE solves.
**Numbers:** Google Meet on a 2-person P2P call runs video directly between browsers - the server doesn't see the pixels. With >2 participants, Meet switches to SFU (Selective Forwarding Unit): the server forwards streams but doesn't decode them. Bandwidth savings vs. a full relay: 40-80% depending on network.
In a WebRTC P2P connection the application server doesn't carry media. So what is the signaling server for?
ICE framework
ICE (Interactive Connectivity Establishment) is the protocol for finding the best path between two peers through NAT and firewalls. The browser collects a list of **ICE candidates** - possible connection addresses - and tries them in priority order.
Three ICE candidate types in descending priority: **host** - local IP (works on the same network), **srflx** (server reflexive) - public IP learned via STUN (works through NAT), **relay** - address of a TURN server (always works, but through a relay).
**Production ICE stats:** Twilio measures ~85% of WebRTC connections established via srflx (STUN), ~10% via host (local network), and only ~5% require TURN relay. But that 5% is corporate users behind strict firewalls who couldn't call at all without TURN. That's why TURN is mandatory in production.
A company has a strict corporate firewall that allows only HTTP/HTTPS traffic. Which ICE candidate type makes a WebRTC connection possible?
SDP
SDP (Session Description Protocol) is a text format for describing a media session: supported codecs, ports, encryption parameters. SDP isn't a WebRTC invention - it has existed since 1998 (RFC 2327) and is used in VoIP, SIP, and RTSP.
In WebRTC, SDP is a peer's 'résumé': 'I support H264 and VP8, opus for audio, DTLS for encryption, here are my ICE credentials'. Negotiation is comparing two résumés and picking the best common parameters.
**Why know SDP by hand?** Most developers never parse SDP manually - the browser builds it via `createOffer()`. But for debugging WebRTC connections, reading SDP is critical: if a connection fails due to codec mismatch, it shows up in SDP. Tool: `webrtc-internals` in Chrome (chrome://webrtc-internals).
Browser A supports only VP8. Browser B supports VP8 and H264. Which codec is used after SDP negotiation?
Offer/Answer
Offer/Answer is the two-step protocol for setting up a WebRTC connection. The initiator creates an **offer** (SDP with its capabilities) and sends it through signaling. The receiver creates an **answer** (SDP with the chosen common parameters) and sends it back. After the exchange the connection is ready for ICE negotiation.
**Trickle ICE** is an optimization: don't wait until all ICE candidates are gathered before sending the offer, send them as they are found (`onicecandidate`). This speeds up connection setup by 500-2000ms. All modern WebRTC implementations use trickle ICE by default.
WebRTC is only for video calls; for data transfer you need WebSocket
WebRTC DataChannel supports arbitrary P2P data: files, game events, collaborative doc operations - with lower latency than going through a server
RTCDataChannel runs over SCTP/DTLS and ships data directly between browsers. Notion experimented with WebRTC DataChannel for collaborative sync. For P2P file sharing it's the best option: big files go directly without server load.
In the Offer/Answer protocol `setLocalDescription` is called before the SDP goes over the network. Why does the order matter?
Key takeaways
- WebRTC = media P2P directly between browsers; the signaling server is only for the initial SDP and ICE exchange and never touches media
- ICE gathers three candidate types (host/srflx/relay) and picks the best path; TURN is mandatory for corporate networks
- Offer/Answer is the two-step SDP exchange that negotiates codecs and parameters; trickle ICE speeds up connection setup by 500-2000ms
Related topics
WebRTC is a stack of protocols, each solving a separate problem:
- Signaling server — Next lesson: implementing signaling over WebSocket and the perfect negotiation pattern for race conditions
- WebSocket — The signaling channel for SDP/ICE exchange is usually WebSocket - the transport that bootstraps a WebRTC connection
- CRDT DataChannel — WebRTC DataChannel can serve as a P2P transport for CRDT operations - an alternative to server relay
Вопросы для размышления
- If WebRTC P2P is so efficient, why do Zoom and Google Meet switch to an SFU server architecture for large conferences?
- A TURN server sees all the media traffic during relay. How does WebRTC ensure privacy in that case?
- Trickle ICE sends candidates as they're discovered, without waiting for all of them. What race condition can occur if the answer arrives before all ICE candidates do?
Связанные уроки
- rt-56 — Signaling server implementation is the next step after understanding the WebRTC handshake
- rt-57 — STUN and TURN are the infrastructure that ICE relies on to traverse NAT
- net-14-udp
- net-23-https-tls