Real-Time Backend
WebSocket Protocol: RFC 6455
December 2011. The IETF publishes RFC 6455 - WebSocket. Before it, real-time web meant ugly hacks: polling every second, long-polling open requests, or Flash sockets. WebSocket's design choice was deliberately conservative: upgrade an existing HTTP connection rather than open a new port. That single decision is why WebSocket works behind every corporate firewall and through every reverse proxy that already passes HTTPS. Trading and games run on the same protocol that ships chat messages.
- **Trading platforms**: market data streams as binary WebSocket frames at thousands of updates per second per symbol
- **Multiplayer games**: position and input frames flow both ways with sub-50ms latency budgets
- **Collaborative editors** (Figma, Google Docs): operational transforms ride binary WebSocket frames between clients
- **Live dashboards**: server pushes metric deltas to browsers without polling overhead
Historical context
Before WebSocket, real-time web applications relied on hacks: polling, long-polling, and Adobe Flash sockets. In 2008, Michael Carter and Ian Hickson began drafting what would become WebSocket. The IETF standardized it as RFC 6455 in December 2011, and the W3C published the browser API simultaneously. The key design decision - upgrade over HTTP rather than open a new TCP port - made WebSocket compatible with existing HTTP infrastructure and firewalls without requiring new ports. Within a year, Socket.IO, Pusher, and most realtime gaming backends adopted it as the default transport.
WebSocket Handshake
WebSocket begins with an **HTTP Upgrade request** - a standard HTTP/1.1 GET with special headers. The server responds with `101 Switching Protocols`, and from that point the TCP connection carries WebSocket frames instead of HTTP. This design means WebSocket works on port 80/443, passes through HTTP proxies, and requires no new firewall rules.
**The GUID magic constant** `258EAFA5-E914-47DA-95CA-C5AB0DC85B11` prevents a plain HTTP server from accidentally responding to a WebSocket handshake. By requiring SHA1 of key+GUID, RFC 6455 ensures both client and server are explicitly implementing the WebSocket protocol.
Why does WebSocket use HTTP Upgrade instead of opening a separate TCP connection on a different port?
WebSocket Framing
After the handshake, data travels as **WebSocket frames**. Each frame has a header (2-10 bytes) containing: FIN bit (last fragment flag), opcode (4 bits), MASK bit, and payload length. Client-to-server frames are always masked with a 4-byte key. Server-to-client frames are never masked. This masking requirement prevents cache poisoning attacks through transparent proxies.
**Why masking prevents proxy poisoning**: a transparent HTTP proxy might cache a response based on the URL. If an attacker crafted a WebSocket message that looked like an HTTP response, the proxy could cache it and serve it to other clients. Masking with a random key makes the frame bytes unpredictable, preventing this attack.
A 200-byte WebSocket message is sent from client to server. What is the total overhead in bytes added by the frame header?
Opcodes and frame types
The 4-bit opcode field defines the frame type: `0x0` (continuation), `0x1` (text/UTF-8), `0x2` (binary), `0x8` (close), `0x9` (ping), `0xA` (pong). Text frames require UTF-8 validation on both ends. Binary frames carry raw bytes with no encoding constraint - ideal for Float32Arrays, protobuf, or MessagePack payloads.
**Message fragmentation**: large messages can be split into multiple frames using the FIN bit. FIN=0 means more fragments follow (continuation frames with opcode=0x0). FIN=1 marks the last fragment. This allows streaming large files over WebSocket without buffering the entire payload before sending.
A multiplayer game sends player position 30 times per second: `{x: 125.4, y: 89.2, angle: 1.57}`. Text or binary frames?
Close Handshake and status codes
WebSocket has a **graceful close handshake**: the initiator sends a Close frame (opcode `0x8`) with an optional 2-byte status code and reason string. The recipient echoes a Close frame back, then both sides close the TCP connection. This bidirectional acknowledgement ensures both sides have processed all pending frames before disconnecting.
**Close codes reference**: 1000 = normal closure, 1001 = endpoint going away (server restarting), 1002 = protocol error, 1003 = unsupported data type, 1006 = abnormal closure (no close frame - TCP reset), 1008 = policy violation, 1011 = server internal error. Code 1006 is special - it cannot be sent in a Close frame, it is synthesized by the client library when the TCP connection drops without a proper close handshake.
A WebSocket connection drops due to network outage. The client receives a close event with code 1006. What happened?
Key Takeaways
- WebSocket starts with an HTTP Upgrade and a 101 Switching Protocols response - same port as HTTPS, no special firewall rules
- Frames carry FIN bit, 4-bit opcode, MASK bit, and payload length in a 2-14 byte header
- Client-to-server frames are always masked with a 4-byte key; server-to-client frames are never masked
- Opcodes: 0x1 text (UTF-8 validated), 0x2 binary (raw bytes), 0x8 close, 0x9 ping, 0xA pong
- Close handshake is bidirectional - both sides exchange Close frames before TCP shuts down. Code 1006 means abnormal drop with no handshake
Related Topics
Topics that build on or extend WebSocket Protocol:
- bt-04-dns-tls — WebSocket over WSS requires TLS - the same TLS handshake analyzed in bt-04-dns-tls
- web-04 — Browser WebSocket API is how frontend code uses the protocol covered here
- bt-03-serialization — Binary WebSocket frames carry serialized payloads - MessagePack or protobuf over binary opcodes
Вопросы для размышления
- Why is the masking requirement on client-to-server frames a security feature rather than a performance penalty?
- When sending high-frequency game state, when does JSON over text frames become a measurable bottleneck compared to a Float32Array binary frame?
- Code 1006 means the TCP socket dropped with no Close handshake. How would reconnect logic differ between code 1006 and code 1000?
Связанные уроки
- bt-04-dns-tls — WebSocket over WSS requires TLS - the same TLS handshake analyzed in bt-04-dns-tls
- web-04 — Browser WebSocket API is how frontend code uses the protocol covered here
- bt-03-serialization — Binary WebSocket frames carry serialized payloads - MessagePack or protobuf over binary opcodes
- devops-04 — Containerized WebSocket servers need sticky sessions or shared state (Redis pub/sub) for horizontal scaling
- rt-05 — Server-Sent Events and QUIC extend the realtime transport options beyond WebSocket
- rt-03-sse
- net-06-ip-intro
- rt-06
- net-03-physical
- net-15-tcp-basics