System Design

Microservice Architecture

Microservices aren't a silver bullet. They solve real scaling and team independence problems, but introduce distributed systems complexity. Understand the trade-offs before choosing.

**Netflix**: migrated from monolith to 700+ microservices - enabled independent deployment and scaling per team
**Amazon**: Conway's Law in action - org structure mirrors service boundaries
**Uber**: started monolith, split to microservices after hitting scaling limits at 1M rides/day

Monolith vs Microservices

**Why does Netflix have hundreds of independent services instead of one large application?**

A monolith is a single deployable unit where all components are tightly coupled. Microservices split an application into independently deployable services, each responsible for a specific business function. Monolith advantages: - Simple development and deployment - No network latency between components - Easy debugging and testing Monolith disadvantages: - Scaling the entire application to scale one component - Slow deployment for any change - Technology lock-in Microservices advantages: - Independent scaling of individual services - Technology diversity (different language per service) - Fault isolation - Faster deployments Microservices disadvantages: - Distributed systems complexity - Network latency - Data consistency challenges - Operational overhead

**Real World:** Amazon was a monolith until ~2002, when Bezos mandated all teams communicate only via APIs. This led to the creation of AWS - infrastructure that powers half the internet.

Microservices are not always better than a monolith. For small teams and early-stage projects, a monolith is often the right choice. Netflix, Amazon, and Uber started as monoliths.

Correct understanding.

Explanation of why.

What is the main operational advantage of microservices over a monolith?

Service Decomposition

**How do you decide which parts to split into separate services?**

Service decomposition is the art of correctly breaking a system into services. Two main approaches: **1. By Business Capability (Domain-Driven Design)** Each service = one business function: - OrderService - order management - PaymentService - payment processing - InventoryService - stock management - NotificationService - sending notifications **2. By Subdomain** Using DDD (Domain-Driven Design) bounded contexts. Each bounded context = potential microservice. **Decomposition Principles:** - Single Responsibility - one service, one reason to change - High cohesion - related functionality together - Low coupling - minimal dependencies between services - Services own their data (database per service) **Red Flags for Wrong Decomposition:** - Two services always change together - One service calls another synchronously for every request - Distributed monolith: microservices in name only

**Real World:** Uber started with a monolith, then split into: TripService, DriverService, PaymentService, NotificationService, MapService, SurgeService. Each team owns their service fully.

What sign indicates an incorrect service decomposition ('distributed monolith')?

Service Communication Patterns

**How do microservices talk to each other - synchronously or asynchronously?**

There are two main communication patterns: **1. Synchronous (request-response)** - REST over HTTP/HTTPS - gRPC (Protocol Buffers, HTTP/2) - GraphQL Pros: Simple, immediate result Cons: Coupling, cascading failures **2. Asynchronous (event-driven)** - Message Queue (RabbitMQ) - Event Streaming (Kafka) - Pub/Sub (Redis, Google Pub/Sub) Pros: Decoupling, resilience, scalability Cons: Eventual consistency, harder debugging **When to use synchronous:** - Need an immediate response (user-facing API) - Simple request-response pattern - Query (reading data) **When to use asynchronous:** - Background processing - Notifying multiple services - Long-running operations - Cross-service data synchronization

**Real World:** In Airbnb's system: booking a place uses synchronous calls (PaymentService, AvailabilityService), while sending notifications to host and guest uses async Kafka events.

Which communication pattern is preferable for operations that don't need an immediate response?

Saga Pattern: Distributed Transactions

**How do you ensure consistency across multiple services when there's no distributed transaction?**

In microservices there's no ACID transactions across services. The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions, each publishing an event. **Two Saga implementations:** **1. Choreography-based Saga** Services react to each other's events. No central coordinator. ``` OrderSvc → event: order.created PaymentSvc hears → event: payment.processed InventorySvc hears → event: inventory.reserved ShippingSvc hears → starts shipping ``` Pros: No central point of failure, loose coupling Cons: Hard to track the overall flow, circular dependencies **2. Orchestration-based Saga** A central coordinator (orchestrator) tells each service what to do. ``` OrderOrchestrator: 1. → PaymentSvc: charge 2. ← success 3. → InventorySvc: reserve 4. ← success 5. → ShippingSvc: ship ``` Pros: Clear flow, easy to debug Cons: Orchestrator becomes a bottleneck **Compensating Transactions** If a step fails, execute compensating actions in reverse: ``` Payment charged → Inventory failed → CompensatingTransaction: refund payment ```

**Real World:** Uber Eats uses Orchestration Saga: OrderOrchestrator coordinates RestaurantService (accept order), PaymentService (charge), DriverService (assign delivery driver).

What is a compensating transaction in the Saga pattern?

Resilience Patterns

**How do you prevent one failing service from taking down the entire system?**

In distributed systems, failures are inevitable. Resilience patterns help prevent cascading failures. **1. Circuit Breaker** Like an electrical breaker - breaks the circuit when too many failures occur. ``` CLOSED (normal) → OPEN (broken) → HALF-OPEN (testing) ``` - Closed: requests pass through normally - Open: requests fail immediately without calling the service - Half-Open: let a few requests through to test recovery **2. Retry with Exponential Backoff** Retry failed requests with increasing delays: ``` Attempt 1: immediate Attempt 2: 1 second Attempt 3: 2 seconds Attempt 4: 4 seconds ``` Add random jitter to prevent the thundering herd problem. **3. Timeout** Always set timeouts on calls to other services. No timeout = potential thread starvation. **4. Bulkhead** Isolate failures using thread pools. Like bulkheads in a ship - one compartment floods, the rest stay dry. **5. Fallback** Provide a degraded response when the service is unavailable: - Cache last known value - Default values - Notify user that feature is temporarily unavailable

**Real World:** Netflix's Hystrix library popularized the Circuit Breaker pattern. When the recommendation service fails, Netflix falls back to a list of popular movies.

What does a Circuit Breaker do in OPEN state?

Database per Service

**Why should each microservice have its own database?**

The 'Database per Service' pattern is one of the core principles of microservices. Each service owns its data and other services can only access it via the API. **Why separate databases?** - Loose coupling: schema changes in one service don't break others - Technology diversity: each service can use the DB best suited for it - Independent scaling: scale the DB with the service - Fault isolation: one DB down doesn't take down all services **Data consistency challenges:** With separate DBs, cross-service queries become a problem. **Patterns for querying across services:** **1. API Composition** The API Gateway or aggregation service calls multiple services and merges the data: ``` OrderDetails = OrderSvc.getOrder() + UserSvc.getUser() + ProductSvc.getProduct() ``` **2. CQRS + Event Sourcing** Read models are materialized views built from events: ``` OrderCreated + PaymentProcessed → OrderSummaryReadModel ``` **3. Shared Database (anti-pattern)** Multiple services using one DB - creates tight coupling. Avoid unless absolutely necessary.

**Real World:** Amazon uses different DB technologies: DynamoDB for user sessions, Aurora for orders, ElasticSearch for search, Redis for caching. Each team chooses what fits their service best.

What is the main benefit of each service having its own database?

Итоги

**Monolith → Microservices**: extract by domain boundary, not by technical layer
**Decomposition**: use DDD bounded contexts - each service owns its data
**Communication**: sync (REST/gRPC) for queries, async (events) for state changes
**Saga pattern**: distributed transactions without two-phase commit - choreography or orchestration
**Circuit Breaker**: fail fast, recover gracefully - never cascade failures across services
**Start with a modular monolith** - extract services only when team and scaling needs are clear

What's Next?

API Gateway (next lesson) is the entry point to a microservice system. Service Mesh (lesson 12) solves service-to-service communication at the infrastructure level.

API Gateway — Entry point and traffic management for microservices
Service Mesh — Service-to-service communication, observability, and security at infra level

Вопросы для размышления

Think of a system you've worked with - could it benefit from microservices? What would be the first service you'd split out, and why?

Связанные уроки

net-01-intro