Distributed Systems

Two-Phase Commit (2PC)

Цели урока

  • Understand the atomicity problem in distributed transactions
  • Know the two phases of 2PC and the roles of coordinator and participants
  • Explain the blocking problem and why the coordinator is a SPOF
  • Distinguish when to use 2PC vs Saga vs Outbox pattern

Предварительные знания

  • Understanding of ACID transactions and WAL (write-ahead log)
  • Familiarity with distributed systems failure modes (crash, omission, timing)
  • Basic understanding of the CAP theorem

Bank transfer between two banks: money debited, network drops, credit never happens. Without 2PC this is not a bug - it is an architectural guarantee of data loss.

  • **Google Spanner** - 2PC + Paxos: the only database with global strong consistency for cross-shard transactions
  • **Kafka Transactions (2017)** - exactly-once semantics via a variant of 2PC with a replicated TransactionCoordinator
  • **PostgreSQL XA** - PREPARE TRANSACTION since 2005, used in enterprise distributed transactions
  • **Amazon DynamoDB Transactions (2018)** - 2PC under the hood for multi-item ACID transactions
  • **Stripe** - idempotency keys as a defense against partial execution without full 2PC overhead

Jim Gray and the birth of distributed transactions

Jim Gray published the description of Two-Phase Commit in 1978 as part of the work on System R at IBM. In 1998 he received the Turing Award - the highest honor in computer science - for his work on transaction management and databases. In 2007 Gray disappeared at sea during a solo sailing trip. The Jim Gray eScience Award for data researchers was named in his honor.

The Atomicity Problem in Distributed Systems

**2007. TJX Companies - the largest payment data breach in history at the time. 94 million cards compromised. One of the technical roots: inconsistency between systems during transaction processing.** This is the atomicity problem: money debited from account A, but due to a network failure never credited to account B. The money vanished between systems.

**Atomicity** - a transaction property where the operation either completes fully or not at all. On a single machine, the DBMS guarantees this via WAL. But when data lives on two different servers - who guarantees "all or nothing"?

Classic example: a bank transfer between two banks, each with its own database. Three failure scenarios: 1. money debited, network dies before credit - sender loses funds 2. credit fails, but debit already committed - same result 3. credit succeeds but debit never runs - money created from nothing. All three are catastrophic.

ScenarioAccount AAccount BResult
SuccessDebitedCreditedAtomic - correct
Crash after debitDebitedNot creditedMoney lost
Crash before debitNot debitedCreditedMoney created
Both unavailableUnknownUnknownUncertainty

The problem extends beyond banking. Any distributed operation: writing to a DB and publishing an event to Kafka, updating multiple shards, synchronizing between microservices. All require a coordinator that either confirms all changes or rolls back all of them.

Using transactions in each individual database is enough to guarantee atomicity of the whole operation

Local transactions only guarantee atomicity within a single database. Atomicity across multiple systems requires a distributed protocol like 2PC

Each database only knows about its own part of the operation. Even if both local transactions are atomic, a failure can occur between their completions. A coordinator that tracks the state of all participants is required.

Bank transfer: money debited from account A, but bank B's server crashed before crediting. What happened from an atomicity perspective?

Two-Phase Commit: the two phases

**2PC was invented by Jim Gray in 1978 - the same person who received the Turing Award in 1998 for his work on transactions and databases.** The idea is elegant: split the commit into two steps - first ask everyone whether they are ready, and only if all say yes - commit. Any refusal at any step triggers a rollback for everyone. This solves the atomicity problem across nodes.

Phase 1: Prepare (voting)

The coordinator broadcasts PREPARE to all participants. Each participant: verifies it can execute the operation; writes its intention to WAL (write-ahead log); acquires the necessary locks. Then replies YES or NO. If even one replies NO - the coordinator immediately broadcasts ABORT to all, including those who said YES.

Phase 2: Commit or Abort

If all replied YES - the coordinator writes the COMMIT decision to its own WAL, then broadcasts COMMIT to all participants. Each participant commits its changes and releases locks. Key point: once the coordinator has written the decision to its log, the transaction is considered committed regardless of whether the messages reached participants.

**Participant state machine:** INIT -> WORKING (received PREPARE) -> PREPARED (replied YES, waiting for decision) -> COMMITTED or ABORTED. The PREPARED state is critical: the participant has already acquired locks and cannot make a decision on its own. It is blocked until it receives a response from the coordinator.

The coordinator sent PREPARE to three participants: A replied YES, B replied YES, C replied NO. What does the coordinator do?

2PC problems: blocking and SPOF

**2PC is a blocking protocol.** This is its primary vulnerability. Between the two phases, participants hold locks and cannot make a decision independently. If the coordinator crashes during this window, the result is a deadlock of indefinite duration.

ProblemWhat happensConsequence
Coordinator crashes after PREPAREParticipants in PREPARED state, holding locksDeadlock until coordinator recovers
Coordinator crashes after writing COMMIT to logSome participants got COMMIT, others did notIndeterminate state without coordinator
Network partition during phase 2COMMIT did not reach all participantsParticipants in inconsistent states
Slow participantAll wait for the slowest oneLatency = max(all participants)

The blocking scenario: t1 - coordinator sends PREPARE to all. t2 - all reply YES. t3 - coordinator writes COMMIT to WAL. t4 - coordinator crashes before broadcasting COMMIT. Result: participants are blocked in PREPARED indefinitely. They hold locks on data, do not know whether to commit or abort, and cannot make a decision without the coordinator.

**WAL for recovery:** coordinator and participants write every step to the log before executing. On recovery: if no PREPARED in log - can abort; if PREPARED exists without COMMITTED - ask coordinator; if COMMITTED exists - apply changes. WAL makes the protocol recoverable, but does not eliminate blocking.

**Coordinator as SPOF:** standard 2PC has a single point of failure - the coordinator. Solution: replicate the coordinator via Paxos or Raft. This is how Google Spanner works: 2PC for atomicity + Paxos for coordinator fault tolerance.

A timeout at participants solves the problem - they will abort and everything will be fine

A timeout does not solve the problem: the coordinator may have written COMMIT to WAL before crashing. If a participant aborts and the coordinator later tries to commit after recovery - data becomes inconsistent

The core difficulty of 2PC: a participant in PREPARED state does not know whether the coordinator made a decision before crashing. Any unilateral decision by a participant on timeout may violate atomicity.

A participant received PREPARE, replied YES, and is waiting for the coordinator's response. The coordinator crashes. What happens to the participant?

Alternatives and real-world use of 2PC

**2PC is used where strict atomicity is genuinely required and blocking is acceptable.** PostgreSQL has supported PREPARE TRANSACTION for the XA protocol since 2005. Google Spanner combines 2PC with Paxos: 2PC provides atomicity across shards, Paxos replicates the coordinator for fault tolerance. Kafka Transactions (since version 0.11) use a variant of 2PC for exactly-once semantics.

System2PC UsageExtension
PostgreSQLPREPARE TRANSACTION for XAPREPARE / COMMIT PREPARED / ROLLBACK PREPARED
MySQL + XADistributed transactionsXA START / XA END / XA PREPARE / XA COMMIT
Kafka TransactionsExactly-once semanticsTransactionCoordinator as replicated log
Google SpannerCross-shard transactions2PC + Paxos for fault-tolerant coordinator
Oracle RACCluster node coordinationDLM (Distributed Lock Manager) on top of 2PC

When 2PC is not the right choice

Microservices are the classic case where 2PC fits poorly. Problems: tight coupling (all services must support the protocol), blocking (one slow service blocks everyone), the coordinator becomes a bottleneck at scale. For microservices, the Saga pattern is a better fit - a chain of local transactions with compensating actions on failure. See the distributed transactions lesson.

  • 2PC — Strict atomicity, 2 round-trips, blocking on coordinator failure. Suited for: bank transactions across shards, XA distributed transactions, systems where partial state is unacceptable.
  • Saga — Eventually consistent, compensating transactions, non-blocking. Suited for: microservices, long-running transactions (order + delivery + payment), systems with high availability requirements.
  • 2PC + Paxos — Strict atomicity + fault-tolerant coordinator. Suited for: Google Spanner, enterprise distributed databases, when consistency without SPOF is required.

**Outbox pattern** - a practical alternative to 2PC for microservices: writing to DB and publishing a Kafka event atomically via a single outbox table in the same DB. A separate process reads the outbox and publishes events. Provides at-least-once delivery without a distributed transaction.

An architect chooses an approach for e-commerce: charge payment + decrement inventory + create order across three separate microservices. What is the best approach?

Вопросы для размышления

  • Google Spanner uses 2PC + Paxos: 2PC for atomicity across shards, Paxos for coordinator replication. What specific problem of 2PC does Paxos solve, and why can Paxos alone not replace 2PC?

Связанные уроки

  • dist-07-transactions — 2PC implements atomic commit for distributed transactions
  • ds-03-consensus — 2PC is a restricted form of consensus
  • dist-08-paxos — Paxos generalises 2PC to tolerate coordinator failure
  • dist-12-consistency — 2PC provides atomic durability underpinning strong consistency
  • db-03-acid
Two-Phase Commit (2PC)

0

1

Sign In