Backend Transport

How Services Communicate

Netflix processes 2 billion requests per day between 700+ microservices. One wrong transport choice - and the user sees an infinite spinner instead of their favorite show. gRPC on Protocol Buffers delivers messages 3-5x smaller than JSON, 3-10x faster at high load. HTTP/2 multiplexing solves HTTP/1.1 head-of-line blocking. Envoy proxy in Kubernetes manages this traffic as a service mesh. The right transport choice is not an implementation detail - it is an architectural decision.

**gRPC + Protocol Buffers** - Uber (4000+ services), Shopify, Netflix: binary serialization 3-5x smaller than JSON, strict types, codegen; HTTP/2 multiplexing without head-of-line blocking
**Envoy proxy (Istio/Linkerd)** - service mesh in Kubernetes; handles gRPC, load balancing, mTLS, observability between services with zero application code changes
**Event Streaming (Kafka)** - Shopify processed $12B in sales in one BFCM day on event-driven architecture; synchronous REST chains would not have survived the load spike

Why Transport Matters

An online store. Everything in one application - product catalog, cart, payments, delivery, notifications. The monolith works fine while the team is small. But when the service grows to a million users, any change to the payment module can break the catalog.

The solution is to break the monolith into separate services. The catalog lives on its own, payments on their own, delivery on its own. But now a question arises: **how will these services communicate?** Previously they called functions directly within one process. Now there's a network between them. Uber uses gRPC for internal communication between 4000+ microservices; Netflix uses REST for the public API and Kafka for internal events.

**Inter-Process Communication (IPC)** is a general term for all the ways processes exchange data. In the microservices world, IPC happens over the network, and the choice of transport directly affects the reliability, speed, and complexity of the entire system.

From Mainframes to Microservices

In the 1970s, programs ran on a single mainframe and communicated through shared memory. In the 1990s, CORBA and DCOM emerged - the first attempts to standardize remote calls. In the 2000s came SOAP, then REST. Today Netflix uses hundreds of microservices handling millions of requests per second - and behind each one is a deliberate transport choice.

Transport is not just a "way to send a request". It is an architectural decision that determines resilience to failures, scaling speed, and maintenance complexity. gRPC on Protocol Buffers delivers binary serialization with messages 3-5x smaller than JSON and 3-10x faster at high load. HTTP/2 multiplexing solves HTTP/1.1 head-of-line blocking. Envoy proxy in Kubernetes manages this traffic as a service mesh. The wrong transport choice is one of the most expensive mistakes in system design.

Why does the transition from monolith to microservices create a transport problem?

Synchronous vs Asynchronous

When service A sends a request to service B, there are two fundamentally different scenarios. **Synchronous**: A sends the request and waits for B to respond. Like a phone call - a question is asked and the line stays open until an answer arrives.

**Asynchronous**: A sends a message and moves on, without waiting for a response. Like an SMS - it's sent, work continues, and the reply comes sometime later (or not at all, in the fire-and-forget model).

**Blocking** - the execution thread stops and waits for a response. **Non-blocking** - the thread continues working. A synchronous call is usually blocking, an asynchronous one - non-blocking. But there are exceptions: async/await in code looks synchronous, but under the hood it doesn't block the thread.

Characteristic	Synchronous	Asynchronous
Coupling	High (A knows about B)	Low (A only knows about the queue)
Response time	Sum of all calls in the chain	Only the time to send the message
Error handling	Immediate error response	Needs a separate mechanism (DLQ, retry)
Debugging	Simple - single thread	Complex - distributed events
Example	HTTP GET /users/123	Message in RabbitMQ

In practice, most systems use **both approaches**. A request to get a user profile - synchronous (need the response right now). Sending an email after registration - asynchronous (the user doesn't need to wait for the email to be sent).

A user clicks 'Buy'. Two things must happen: 1) charge payment, 2) send receipt by email. What is the optimal approach?

Communication Patterns Overview

Synchronous and asynchronous are ways of interacting. But there are dozens of specific **technologies** to implement these approaches. Let's cover the key patterns, each of which solves a certain class of problems.

**RPC (Remote Procedure Call)** - calling a remote function as if it were local. The client calls `getUser(123)`, and under the hood the request flies over the network. gRPC from Google is the most popular implementation: binary Protocol Buffers (3-5x smaller than JSON), HTTP/2 multiplexing (multiple requests in one connection without head-of-line blocking), strict types, code generation. In Kubernetes, Envoy proxy acts as a service mesh routing gRPC traffic between services.

**REST (Representational State Transfer)** - an architectural style on top of HTTP. Resources (users, orders) have URLs, and operations are expressed through HTTP methods (GET, POST, PUT, DELETE). The most common approach for public APIs.

**GraphQL** - the client describes exactly what data it needs. One endpoint, flexible queries. Solves the over-fetching problem (got more than needed) and under-fetching (had to make 3 requests instead of one).

**Message Queue** - a message queue: one sender, one receiver. A message is deleted after processing. Suitable for command tasks: 'send email', 'process payment'.

**Event Streaming** - an event stream: one sender, many receivers. Messages are stored in a log and can be replayed. Suitable for notifications: 'order created' - the warehouse, analytics, and notifications service all want to know.

The line between Message Queue and Event Streaming isn't always clear. Kafka can work as a queue, and RabbitMQ can work as pub/sub. But the philosophy is different: queue - 'process this task', streaming - 'an event occurred'.

An order is created. Three services need to know: warehouse (reserve item), analytics (update dashboard), email service (send confirmation). Which pattern fits best?

How to Choose Transport

Returning to the online store from the beginning of the lesson - now, knowing the patterns, specific decisions can be made. There is no single 'best' transport - there's the **right one for a specific task**.

Criterion	REST	gRPC	GraphQL	Message Queue	Event Stream
Latency	Medium	Low	Medium	High*	Medium
Coupling	Medium	High	Low (client)	Low	Low
Typing	Weak (JSON)	Strict (Protobuf)	Strict (Schema)	Depends on format	Depends on format
Multiple receivers	No	No	No	Limited	Yes
Delivery guarantee	No (retry manually)	No (retry manually)	No	Yes (ack)	Yes (offset)
Debugging	Simple (curl)	Medium (grpcurl)	Medium (playground)	Complex	Complex

High latency for Message Queue is relative. We're talking about milliseconds to tens of milliseconds. For background tasks this is acceptable, but for a UI response every millisecond counts.

Practical rule: **when an immediate response is needed** - synchronous transport (REST, gRPC, GraphQL). **When the result can wait** - asynchronous (queue, stream). **When there are many receivers** - Event Streaming.

Notice: one system uses **four different transports**. This is normal. A good architect doesn't pick one tool for every situation - they match the tool to the task.

In the following lessons we'll dig into each transport in detail: starting with the foundation (TCP/IP, serialization), moving through HTTP and REST, going deep into gRPC and GraphQL, and then transitioning to message queues and event streaming.

REST is suitable for all tasks - it's a universal standard

REST is great for public APIs and CRUD operations, but for high-load internal communication gRPC is faster, and for event-driven architecture queues and streams are required

REST on HTTP/1.1 with JSON is convenient for debugging (curl, Postman) but creates overhead. gRPC on Protocol Buffers + HTTP/2 delivers messages 3-5x smaller than JSON and 3-10x faster at high load - that's why Uber, Shopify, and Netflix use gRPC for internal communication. HTTP/2 multiplexing solves HTTP/1.1 head-of-line blocking: multiple requests in one TCP connection without one blocking the others. And for the 'one event - many receivers' pattern, REST is not designed for it at all.

Key Takeaways

**Microservices = transport problem**: function call (nanoseconds) → network call (milliseconds + failures + timeouts); Uber moved to gRPC for 4000+ services
**Synchronous** (REST, gRPC, GraphQL) - need an immediate response; **Asynchronous** (queues, streams) - result can wait
**gRPC vs REST**: Protocol Buffers 3-5x smaller than JSON, 3-10x faster at high load; HTTP/2 multiplexing solves HTTP/1.1 head-of-line blocking
**Envoy proxy** - service mesh in Kubernetes, manages gRPC/REST traffic without application code changes; Istio and Linkerd are built on it
**Event Streaming** (Kafka): 'one event - many subscribers' without tight coupling; Shopify handled $12B BFCM peak on event-driven architecture

Вопросы для размышления

Uber uses gRPC for 4000+ internal microservices but REST for the public API. Why gRPC internally and not REST - and what specifically does Protocol Buffers + HTTP/2 multiplexing give compared to JSON + HTTP/1.1?
Shopify handled $12B in sales in one BFCM day on event-driven architecture with Kafka. Why would synchronous REST chains (each service calling each other) have failed under that load spike?
Envoy proxy in Kubernetes manages gRPC traffic as a service mesh with zero application code changes. What does this mean in terms of separation of concerns between the transport layer and the application layer?

Связанные уроки

bt-02-osi-tcp — OSI model and TCP deep dive follow directly from the transport overview
net-01-intro — Backend transport protocols build on top of networking layer fundamentals
st-01-feedback-loops — TCP acknowledgment mechanism is a textbook feedback loop
alg-01-big-o — Protocol selection is a complexity analysis: HTTP/2 multiplexing vs HTTP/1.1 head-of-line blocking
sd-01-intro — System design components communicate through the transport protocols surveyed here
net-15-tcp-basics