Distributed Systems
Distributed Transactions
Цели урока
- Understand the atomicity problem across independent services and the 4 partial failure scenarios
- Know the 2PC protocol, its guarantees, and key limitations (blocking, coordinator failure)
- Apply the Saga Pattern with compensating transactions for eventual consistency
- Choose between Orchestration and Choreography based on flow complexity and requirements
Предварительные знания
- ACID transactions in a single database (atomicity, isolation, durability)
- Basic understanding of microservices architecture
- WAL (Write-Ahead Log) concept and crash recovery
- Eventual consistency vs strong consistency
eBay in 2004 was losing USD 1.5B per year on partial service failures - payment went through, order was never created. This is not a code bug, it is a fundamental problem of distributed systems without the right protocol.
- **Uber Payments** - Saga with Orchestration for ride payment: 5 steps (auth, hold, trip, release, charge), compensate on any failure
- **Amazon Order Pipeline** - Choreography via EventBridge: 100+ steps, each service is autonomous, no single coordinator
- **Stripe Payments** - 2PC for internal DB + idempotency keys for API - clients can safely retry any request
- **Google Spanner** - distributed transactions via 2PC over Paxos: coordinator is the Paxos group leader
- **Airbnb Booking** - Saga + Outbox Pattern: event written to the same DB as business data - atomicity via one local transaction
2PC and the history of distributed transactions
Two-Phase Commit was invented by Jim Gray in the late 1970s at IBM Research. In 1979 he published its formal description in 'Notes on Database Operating Systems'. Gray also introduced ACID, WAL, and concurrency control - the foundation of all modern databases. He received the Turing Award in 1998. In 2007 Gray disappeared at sea on his yacht and was never found.
Atomicity Across Service Boundaries
**2004, eBay. The payment service charged PayPal - confirmed. But the order record in eBay's DB was never saved: the server crashed 80ms later. Customer paid, item not reserved. Over a year, such partial failures cost the company USD 1.5B in losses and refunds.** This is the distributed transaction problem: an operation spans multiple nodes, but ACID guarantees only apply within a single database. Related to the CAP theorem choice between C and A.
**Three partial failure scenarios:** 1. first node succeeded, second crashed - data lost 2. both succeeded but the acknowledgment was lost in the network - duplicate on retry 3. first succeeded, second failed on business logic - need to roll back the first. Each scenario requires a separate strategy.
| Scenario | Problem | Required strategy |
|---|---|---|
| Node B crashed after A succeeded | Money debited, not credited | 2PC rollback or compensation |
| A's response lost in network | Client retries - double charge | Idempotency key |
| B failed on business logic | A done, B not - need to undo A | Saga compensating transaction |
| Coordinator crashed mid-flow | Participants blocked in PREPARED | 2PC timeout + 3PC or Saga |
A transaction in microservices works just like BEGIN/COMMIT in PostgreSQL
Each service commits its local transaction independently - there is no shared ROLLBACK across service boundaries
PostgreSQL ACID is one process, one WAL, one lock manager. In a distributed system each service has its own DB, its own isolation, its own commit point. Coordination requires protocols (2PC) or patterns (Saga) - and neither provides full ACID.
Client called a money transfer: service A (debit) replied OK, service B (credit) crashed. What is the system state?
Two-Phase Commit (2PC)
**Two-Phase Commit** is the classic protocol for atomic distributed transactions. One node becomes the Coordinator, others are Participants. The protocol works by locking resources until a final decision is made: either everyone commits or everyone aborts.
| 2PC Problem | Description | Impact |
|---|---|---|
| Blocking | Resources locked from Phase 1 until Phase 2 | Under high RPS - 5-10x throughput degradation |
| Coordinator failure | Participants stuck in PREPARED indefinitely | Deadlock until coordinator recovers |
| Network partition | Coordinator cannot reach some participants | Timeout forces ABORT even if all were ready |
| Does not scale | Synchronous wait for all participants | Latency = max(latency of all nodes) |
**2PC is a blocking protocol.** If the Coordinator crashes after Phase 1 (all replied YES) but before Phase 2 - participants hold locks indefinitely. Three-Phase Commit (3PC) partially solves this, but has its own trade-offs. In production microservices, 2PC is rarely used. Covered in detail in the 2PC lesson.
The Coordinator in 2PC wrote COMMIT to its WAL, then crashed before sending COMMIT to participants. What happens?
Saga Pattern: Chain of Compensations
**Uber, 2016. When launching their microservices architecture they hit a wall: 2PC does not work across 50+ independent services (trips, payments, drivers, surge, promotions). Even a Paxos-replicated coordinator could not help - a different pattern was needed. Saga Pattern: a long transaction is split into a chain of local transactions. Each step has a compensating transaction. If step N fails - compensations for steps N-1, N-2, ... run in reverse order.**
**Key difference between compensation and rollback:** a normal ROLLBACK erases the operation as if it never happened. A compensating transaction is a new business operation (issue a refund, release a reservation). It appears in logs, it can partially fail, and it must be idempotent. This is eventual consistency, not ACID.
Saga guarantees ACID the same way as a local database transaction
Saga provides eventual consistency - between steps the system is in an intermediate state
After step 1 (debit) and before step 2 (credit), real time passes - tens of milliseconds. Other clients may see debited money and an unreserved item. This is called ACI without D, or BASE (Basic Availability, Soft state, Eventual consistency). For most business workflows this is acceptable.
In a Saga, steps 1 and 2 completed successfully, step 3 failed. What should happen?
Choreography vs Orchestration
**Saga is implemented in two ways.** Orchestration: one central service (Orchestrator) calls all steps in order and manages compensations. Choreography: each service listens to events and reacts independently - no central coordinator. Amazon Order Service, Uber Dispatch, Netflix Watch Party - all use a hybrid depending on flow complexity.
| Approach | How it works | Pros | Cons |
|---|---|---|---|
| Orchestration | Central service calls steps and compensations | Flow readable as code, easy debug, explicit rollback | Single point of failure, coupling through orchestrator |
| Choreography | Services react to events from a queue | Loose coupling, no SPOF, easy to add a step | Implicit flow, hard to debug, compensation needs planning |
**Selection rule:** if the flow is linear and has 3-5 steps - use Orchestration (easier to debug). For 10+ steps or maximum service autonomy - use Choreography. Uber uses Orchestration for critical financial flows (ride payment) and Choreography for analytics pipelines.
| System | Approach | Why |
|---|---|---|
| Uber Payments | Orchestration | Financial atomicity, explicit audit trail required |
| Amazon Order | Choreography | 100+ steps, independent teams, loose coupling |
| Netflix Billing | Orchestration | Compliance requires predictable rollback |
| Airbnb Booking | Saga + Outbox | Outbox pattern for reliable event publishing |
Which approach is better for a financial process with 4 steps where a clear audit log and predictable rollback are critical?
Вопросы для размышления
- A bank implements an account transfer via Saga (debit one account, credit another). Between steps 1 and 2 there is a 200ms gap. What issues can arise and how should they be addressed?
Связанные уроки
- ds-03-consensus — Atomic commit requires agreement across nodes
- dist-07-2pc — 2PC is the standard distributed transaction commit protocol
- dist-12-consistency — Transaction isolation level determines consistency guarantees
- ds-11-distributed-locks — Pessimistic transactions use distributed locks
- db-13-transactions