Distributed Systems

Distributed Locks

Цели урока

Understand when a distributed lock is needed and when to avoid it
Implement a safe Redis lock with a unique token and a Lua script
Know the Redlock algorithm and its limitations under GC pauses
Apply fencing tokens to protect against stale lock holders
Choose between Redis, Redlock, ZooKeeper, and etcd based on requirements

Предварительные знания

Redis basics: SET NX PX commands and Lua eval
Concepts of race condition and critical section
Basic understanding of quorum and consensus in distributed systems

One Redis lock (over Raft or a single node) without a unique token - and a 2-second GC pause turns a mutex into a sieve: two processes simultaneously in the critical section, double debits, data loss.

**Stripe, PayPal** - idempotency keys on every payment are effectively a distributed lock at the business-logic level
**Kubernetes** - etcd-based locks for leader election of controller-manager and scheduler
**Apache Kafka** - historically ZooKeeper for broker coordination and partition leadership
**Cron in a cluster** - without a distributed lock every job runs on all Pods simultaneously
**GitHub Actions self-hosted runners** - workflow-level locks to prevent parallel deployments

The Antirez vs Kleppmann Debate (2016)

In February 2016 Martin Kleppmann published 'How to do distributed locking', proving Redlock is unsafe under GC pauses and clock skew. Antirez (Redis author) responded with a detailed rebuttal. The debate ran for several weeks and drew in Kyle Kingsbury (aphyr) and others. The outcome: Redlock is safe enough for most workloads, but not for systems where correctness is critical - those require fencing tokens. The exchange became a textbook example of how to reason about safety in distributed systems.

Why Distributed Locks Exist

**2012. Stripe processes payments across multiple instances. Two Pods simultaneously receive a bank webhook for the same successful payment - and both start crediting bonuses. Double credit. 100,000 transactions per day. A local mutex does nothing: the Pods do not share memory (unlike leader election).** A mutex at the cluster level is required.

**Distributed lock** - a mechanism that guarantees only one process in the cluster executes a critical section at any given time. Unlike a local mutex, it lives in a shared store (Redis, ZooKeeper, etcd).

When a distributed lock is needed

Scenario	Problem without a lock	Solution
Payment processing	Double debit/credit on race condition	Lock on transaction ID
Cron jobs in a cluster	5 Pods run the same job simultaneously	Lock on job name before starting
External API with rate limit	Combined cluster rate exceeds API limit	Lock or distributed rate limiter
Leader election	Multiple services believe they are the leader	Whoever holds the lock is the leader

**Key rule:** a distributed lock is the last resort, not the first choice. Check first: is the operation idempotent? Can CRDT or optimistic locking be used? Need a single writer - consider leader election. A lock adds latency, a potential single point of failure, and complexity.

Distributed locks work like a regular mutex - acquire, do work, release, everything is safe

In a distributed system, a GC pause, network partition, or crash can occur between acquiring the lock and using it - another process may acquire the same lock in that window

A local mutex lives in process memory - there is no gap between checking and acquiring. Over a network every step is a separate request with potential delay. That is why additional mechanisms (fencing tokens, TTL, unique identifiers) are needed to detect a stale holder.

Two Pods simultaneously process the same payment webhook event. Which mechanism correctly prevents double processing?

Redis Lock: From Naive to Safe

**Redis is the first choice for distributed locks in 90% of cases.** The command `SET key value NX PX ttl` is atomic: it writes a value only if the key does not exist, with automatic expiry. One round-trip instead of a transaction. But the naive implementation contains a dangerous bug.

**GC pause attack scenario:** Process A acquires the lock. The JVM performs a GC pause for 2 seconds. TTL expires. Process B acquires the same lock. Process A exits the pause, believes it still holds the lock, and on `release()` deletes Process B's lock. Now both processes are in the critical section simultaneously.

Safe implementation: unique token

**Why a Lua script?** GET + DEL are two separate requests. Between them another process can acquire the lock and the TTL can expire at just the right moment. A Lua script executes atomically as a single Redis command.

Why use a Lua script for release() instead of sequential GET + DEL?

Redlock: Quorum Across 5 Nodes

**A single Redis node is a single point of failure.** If it goes down or performs an AOF fsync, the lock is unavailable. Antirez (Redis author) proposed Redlock in 2013 - a locking algorithm across N independent Redis nodes. Acquisition succeeds only if the lock is granted by a majority (N/2 + 1) of nodes within a time window shorter than the TTL.

Step	Action	Continue if
1	Record startTime = Date.now()	Always
2	SET NX sequentially on each node with a short per-node timeout	Collect all responses
3	Count successCount and elapsed = Date.now() - startTime	Always
4	Check: successCount >= N/2+1 AND ttl - elapsed > clockDrift	Lock acquired - proceed
5	If condition fails - DEL on all nodes	No lock - retry or return error

**Martin Kleppmann's critique (2016):** Redlock is unsafe under GC pauses and clock skew. Scenario: Process A acquires Redlock, enters a GC pause for 10 seconds, TTL expires, Process B acquires the same Redlock, Process A resumes - both are in the critical section. Antirez responded in detail but the debate remains open. For financial systems use fencing tokens.

Redlock on 5 nodes is fully safe - even if 2 nodes fail the majority holds the lock

Redlock is safe against node failures but unsafe against GC pauses and long delays - a GC pause can happen after a successful acquisition

The problem is not how many nodes hold the lock but that an arbitrary amount of time can pass between acquiring the lock and actually using the resource (GC, CPU scheduling). During that time TTL can expire on all nodes and another client acquires the same lock. Fencing tokens address this independently of the locking mechanism.

Redlock on 5 nodes: 2 out of 5 acquired. What happens?

Fencing Tokens and ZooKeeper

**Martin Kleppmann proposed fencing tokens in 2016 as the only way to guarantee correctness under GC pauses.** The idea: the lock server issues a monotonically increasing number on each acquisition. The resource rejects requests with a stale number. Process A acquires the lock with token=33, freezes, token=34 is issued to Process B, A resumes - the resource sees 33 < 34 and rejects A's request.

ZooKeeper: ephemeral sequential nodes

ZooKeeper provides locking primitives at the consensus level (ZAB protocol). A client creates an **ephemeral sequential node** - ZooKeeper automatically appends a monotonic suffix (lock-0000001, lock-0000002). The client with the smallest suffix holds the lock. When a session is lost the ephemeral node is deleted automatically.

Mechanism	Reliability	Latency	Fencing tokens	When to use
Redis single node	Low (SPOF)	< 1 ms	No (must implement)	Dev/staging, non-critical tasks
Redlock (5 nodes)	Medium (quorum)	3-10 ms	No	High availability, non-financial
ZooKeeper	High (ZAB)	5-20 ms	Yes (sequential nodes)	Cluster coordination, leader election
etcd (Raft)	High (Raft)	5-15 ms	Yes (revision)	Kubernetes environments, service mesh

What is the key advantage of ZooKeeper ephemeral sequential nodes over a Redis lock for implementing fencing tokens?

Вопросы для размышления

A financial transaction processing system runs on 10 Pods. Identify: 1. which operations are safe with idempotency alone and require no lock 2. which need a Redis lock 3. which need ZooKeeper or etcd with fencing tokens.

Связанные уроки

dist-07-transactions — Locks implement pessimistic transaction isolation
dist-09-raft — Lock services are built on consensus (etcd/ZooKeeper)
dist-12-consistency — Locks provide mutual exclusion underpinning strong consistency
db-13-transactions