Databases
Distributed Transactions: 2PC and Alternatives
Booking.com. A user books a hotel and a flight simultaneously. The payment service charges the card. The hotel service is unavailable. Money gone, hotel not booked. This is not a bug - it is the fundamental problem of distributed transactions. Solutions exist - from 2PC to Saga, each with a cost.
- Uber uses Cadence (Temporal's predecessor) for Saga-orchestration of rides: charge, dispatch, trip, payout - each step with compensation. 100M+ workflow executions per day
- Google Spanner serves Google Ads, Google F1 (YouTube), Google Cloud SQL - globally consistent transactions through TrueTime + Paxos. 99.999% availability
- Stripe uses idempotency keys + outbox pattern instead of distributed transactions - every operation is idempotent, at-least-once delivery through Kafka
2PC: Coordination Through Two Phases
Two-Phase Commit (2PC) - a protocol from 1978, Jim Gray. Goal: atomic writes across multiple databases. Either all commit or all rollback. The XA standard (X/Open Distributed Transaction Processing) formalized the interface. Java JTA implements it on top of XA.
The problem with 2PC: the coordinator is a single point of failure. If the coordinator crashes after sending PREPARE but before sending COMMIT, participants are blocked indefinitely. They have logged PREPARE (cannot commit on their own), resources are frozen. This is called an in-doubt transaction. Industrial databases hold in-doubt transactions until a new coordinator arrives.
In-doubt transactions in PostgreSQL are visible in pg_prepared_xacts. They hold row locks - blocking normal transactions. If the coordinator fails permanently, a DBA must intervene manually: COMMIT PREPARED or ROLLBACK PREPARED. This is one of the main arguments against 2PC in microservices: an operational nightmare during failures.
What happens if the coordinator crashes between the PREPARE and COMMIT phases?
3PC: Attempting to Solve the Blocking Problem
Three-Phase Commit (3PC) adds an intermediate PRE-COMMIT phase between PREPARE and COMMIT. The goal: eliminate the blocking failure mode of 2PC. If the coordinator fails, participants can make an autonomous decision based on knowledge of the state of other participants.
The problem with 3PC: it is unsafe under network partitions. A network partition at the PreCommit moment: some participants received PreCommit, others did not. One group decides to commit, the other to abort. Atomicity violated. 3PC is only safe in a synchronous network without partitions - an unrealistic assumption for distributed systems.
Paxos Commit (Gray and Lamport, 2004) - the formal solution: instead of a coordinator, a Paxos quorum is used. Each voting phase goes through Paxos. If the 'coordinator' fails, any participant can initiate a Paxos round. Google Spanner uses exactly this approach through Paxos groups. Latency is higher: 2-3 round trips instead of 2.
Why does 3PC not solve the 2PC problem under network partitions?
Saga: Eventual Consistency Instead of ACID
Saga Pattern (Garcia-Molina and Salem, 1987) - abandons distributed ACID in favor of eventual consistency with compensating transactions. Instead of one atomic transaction, a sequence of local transactions each with a compensating transaction for rollback.
Choreography vs Orchestration. Choreography: each service publishes an event, the next reacts (Kafka events). No central coordinator. Difficult to trace the flow. Orchestration: a central Saga Orchestrator (a separate service or Temporal Workflow) calls each step and handles errors. Easier to observe, easier to debug.
Temporal.io is a Saga orchestration platform. Workflows are durable functions: on crash they automatically resume from the last checkpoint through event sourcing. Uber uses Temporal for 100M+ workflow executions per day. Cadence (Temporal's predecessor) handles Uber trips.
What is the key difference between Saga and 2PC?
Google Spanner and Paxos Commit
Google Spanner (2012): the first system with external consistency at global scale. TrueTime API - synchronization via GPS and atomic clocks with an error bound under 7ms. Transactions are assigned a commit timestamp after waiting 2*epsilon to guarantee ordering.
Spanner uses Paxos groups instead of a 2PC coordinator. Each shard is a Paxos group of 5 replicas. A transaction spanning multiple shards: the leader of each shard participates in a 2PC-like protocol, but coordinator (leader) failure is transparently handled through Paxos leader election. Blocking period = Paxos election timeout (~100ms).
CockroachDB and YugabyteDB implement a similar approach without GPS. Instead of TrueTime - HLC (Hybrid Logical Clocks). Slightly weaker guarantees (linearizability vs external consistency), but no special hardware required. CockroachDB is used by DoorDash and Netflix for globally distributed transactions.
Which 2PC problem does Paxos Commit solve?
Choosing a Protocol: A Pragmatic Guide
The choice between 2PC, Saga, and Paxos Commit is governed by three factors: consistency requirements, latency budget, and operational complexity. There is no universal answer - only trade-offs.
Practical rule for 2024: avoid 2PC in microservices. Saga (Choreography or Temporal) for most inter-service transactions. 2PC is acceptable within a single PostgreSQL cluster using pg_prepared_xacts (XA). Paxos/Raft-based systems (CockroachDB, Spanner) for global ACID when the budget allows.
Outbox Pattern - a reliable way to integrate Saga with event-driven systems: INSERT into the orders table + INSERT into the outbox table in a single local transaction. A separate poller reads the outbox and publishes events. Guarantees at-least-once delivery without a distributed transaction. Debezium (Kafka Connect) implements this through CDC from the outbox.
Distributed transactions can be implemented reliably through 2PC with good monitoring
2PC has a fundamental blocking failure mode: coordinator crash = frozen participants until recovery. Monitoring helps detect it but does not eliminate the problem
FLP impossibility (Fischer-Lynch-Paterson 1985) proves: in an asynchronous network with one failing process, distributed consensus in finite time is impossible. 2PC is not an exception. Paxos/Raft work around this through probabilistic guarantees (quorum, timeouts). Saga works around it through relaxed consistency
When does 2PC remain a reasonable choice in 2024?
Related Topics
Distributed transactions cross the boundaries of databases, messaging, and consensus:
- Locks and Deadlocks — Local locks are the foundation for distributed locking
- CAP Theorem — CAP defines the space of possible trade-offs
- Saga Pattern in Messaging — Choreography through event-driven messaging
Key Ideas
- 2PC: coordinator = SPOF, blocking failure mode on crash. Works within a single cluster
- 3PC: non-blocking under fail-stop, but vulnerable to network partitions - rarely used in practice
- Saga: eventual consistency + compensating transactions. Choreography (events) or Orchestration (Temporal)
- Paxos Commit (Spanner, CockroachDB): ACID without blocking through Paxos leader election
Вопросы для размышления
- How do you choose between Choreography Saga and Orchestration Saga for a specific business process?
- What is compensation idempotency and why is it critical for a correct Saga implementation?
- Where do you draw the line between eventual consistency being sufficient and strong consistency being mandatory?
Связанные уроки
- db-15-locks — Locks are the foundation of distributed coordination
- db-03-acid — ACID is the goal of distributed transactions
- db-04-cap — CAP theorem defines the limits of distributed transactions
- bt-18-saga — Saga pattern is the main alternative to 2PC
- dist-07-transactions