Backend Transport

HTTP/1.1: Request, Response, Headers

1991: Tim Berners-Lee sends the first HTTP request - three lines of text. 1997: RFC 2068, Roy Fielding adds caching, methods, status codes. 2025: the same text-based protocol carries LLM API calls, streaming inference, and webhook callbacks. Thirty years, zero breaking changes.

LLM APIs (OpenAI, Anthropic) - HTTP POST with JSON body and Bearer token in Authorization
Streaming inference - chunked transfer encoding, SSE via text/event-stream
Vector DB HTTP APIs - Qdrant/Pinecone accept JSON over REST, ETag for collection cache
LLM provider webhook callbacks on batch inference completion - POST to client endpoint
CDN layer (Cloudflare Workers) - Cache-Control and ETag in every API response

HTTP Message Anatomy

1991. Tim Berners-Lee sends the first HTTP request - three lines of ASCII over a TCP socket. No headers, no status codes, just `GET /page`. The same principle now drives 5 billion devices. A typical request to Cloudflare carries 50 headers. The mechanism has not changed.

An HTTP message is plain text divided into three parts: a **start line** (method/URL or status), **headers** (key-value pairs, one per line), and a **body** (separated by a blank line). That is the entire specification - ASCII over TCP, nothing more.

**Persistent connections (Keep-Alive):** HTTP/1.0 closed the TCP connection after every response. HTTP/1.1 keeps it open by default via `Connection: keep-alive`. For a page loading 30 resources that is the difference between 30 TCP handshakes and one.

There is a problem Keep-Alive does not solve: **head-of-line blocking**. HTTP/1.1 processes requests sequentially on one connection. If the first request stalls, everything waits. Browsers work around this by opening 6-8 parallel TCP connections per host. HTTP/2 eliminates the issue with multiplexing - that is the next lesson.

**Chunked Transfer Encoding:** when the body size is unknown upfront, the server sends `Transfer-Encoding: chunked` instead of `Content-Length`. Each chunk is prefixed with its size in hex. LLM streaming inference works exactly this way - tokens arrive as separate chunks as the model generates them.

What is mandatory in an HTTP/1.1 request that was absent in HTTP/0.9?

Methods and Status Codes

RFC 7231 contains a claim that surprises most engineers: **POST does not create a resource. PUT creates one.** Yet 90% of REST APIs use POST for creation because the URL of the new resource is unknown to the client beforehand. This is not an error - it is a deliberate deviation from the spec for practical reasons. Knowing the gap between formal specification and real-world use is a mark of experience.

**Safe and idempotent are different properties.** A safe method does not modify server state (GET, HEAD, OPTIONS). An idempotent method produces the same result when called multiple times with the same parameters (GET, PUT, DELETE). DELETE is idempotent but not safe. POST is neither.

**404 vs 410:** `404 Not Found` says the server could not find the resource but does not explain why. `410 Gone` says it existed and was permanently removed. Google and Bing remove a URL from the index immediately on 410. On 404 they may keep re-crawling for weeks.

For LLM API clients, three codes matter most: `429 Too Many Requests` with a `Retry-After` header (rate limit), `402 Payment Required` (credits exhausted), and `413 Payload Too Large` (context limit exceeded before the request reaches the model). Handling these correctly is the difference between a production client and a prototype.

DELETE is called idempotent. What does that mean in practice?

Headers and Cookies

HTTP is stateless. Every request is independent - the server recalls nothing about the previous one. Applications, however, need state: sessions, shopping carts, permissions. Cookies are the compromise: the browser stores small strings and attaches them to every request. The server reads the cookie and "remembers" the user. This does not violate HTTP's stateless nature - it adds a layer on top of it.

**CORS preflight:** before a POST or PUT with custom headers, browsers send an `OPTIONS` request. The server responds with `Access-Control-Allow-Methods` and `Access-Control-Allow-Headers`. If acceptable, the browser sends the actual request. A backend that does not handle OPTIONS breaks every frontend running on a different origin.

**SameSite=None without the Secure flag** is a common misconfiguration for cross-site cookies. Since 2020 browsers reject such cookies silently. The symptom: authentication works in HTTP development, breaks in HTTPS production - or the reverse.

Which cookie flag protects a session token from being stolen by an XSS attack via JavaScript?

Caching and ETag

Every request to Cloudflare completes in roughly 3 ms. About 2 ms of that is network propagation. The origin server never sees the majority of requests at all. This is not an infrastructure trick - it is HTTP caching, baked into the protocol by Roy Fielding in 1997 as a core constraint of the REST architectural style.

**ETag for embedding cache:** requests for embeddings of the same text cost the same regardless of frequency. Caching via ETag or content hash avoids paying for repeated computations. Qdrant and Pinecone use this pattern in their HTTP APIs - collection versions appear as `ETag` values in responses.

**no-store vs no-cache** is the most common source of confusion in HTTP caching. `no-cache` does NOT mean "do not cache" - it means "store the cache but always validate before use". `no-store` means "do not persist at all". For sensitive data (tokens, medical records, financial information) only `no-store` provides the required guarantee.

**Last-Modified as an ETag alternative:** the server returns `Last-Modified: Wed, 21 Oct 2025 07:28:00 GMT`, the client sends `If-Modified-Since: Wed, 21 Oct 2025 07:28:00 GMT`. Less precise - resolution is one second. ETag is preferred, but Last-Modified is simpler to implement for static file servers.

HTTP being stateless means sessions and user state cannot be maintained

Stateless means each request is self-contained. State lives outside the protocol - in cookies, JWT tokens, or URL parameters

Stateless is a transport-layer constraint, not an application-layer one. Cookies are attached to every request, making it self-contained. The server stores no context between requests - but the client includes a context identifier in each one

A server responds with `Cache-Control: no-cache`. What happens on the next request for the same resource?

HTTP/1.1: what backend engineers need to know

HTTP message = start line + headers + blank line + body; all plain text over TCP
Keep-Alive holds the TCP connection; chunked encoding handles bodies of unknown size (LLM streaming)
Safe = no state change (GET); idempotent = repetition is safe (GET, PUT, DELETE); POST is neither
404 - not found; 410 - gone permanently; 429 - rate limited; 304 - cache valid
HttpOnly guards cookies from XSS; SameSite=None requires Secure; no-cache is not no-store

Вопросы для размышления

A client sends PUT /users/42 twice with identical data. What should the result be and what status codes are expected on the first and second call?
For an embeddings API that returns identical vectors for identical text, what caching strategy fits best and why?
How does browser behavior differ between SameSite=Lax and SameSite=Strict - in what real scenario does the difference matter?

Связанные уроки

bt-04-dns-tls — TLS handshake precedes every HTTPS request
bt-06-rest — REST semantics build on top of HTTP methods and status codes
bt-07-http2-http3 — HTTP/2 solves the head-of-line blocking from HTTP/1.1
aie-05-api-integration — LLM API is HTTP POST with Bearer token and JSON body
bt-02-osi-tcp — HTTP runs over TCP; every session begins with a TCP handshake
net-21-http-basics