Distributed Systems
Distributed Locks
Цели урока
- Understand when a distributed lock is needed and when to avoid it
- Implement a safe Redis lock with a unique token and a Lua script
- Know the Redlock algorithm and its limitations under GC pauses
- Apply fencing tokens to protect against stale lock holders
- Choose between Redis, Redlock, ZooKeeper, and etcd based on requirements
Предварительные знания
- Redis basics: SET NX PX commands and Lua eval
- Concepts of race condition and critical section
- Basic understanding of quorum and consensus in distributed systems
One Redis lock (over Raft or a single node) without a unique token - and a 2-second GC pause turns a mutex into a sieve: two processes simultaneously in the critical section, double debits, data loss.
- **Stripe, PayPal** - idempotency keys on every payment are effectively a distributed lock at the business-logic level
- **Kubernetes** - etcd-based locks for leader election of controller-manager and scheduler
- **Apache Kafka** - historically ZooKeeper for broker coordination and partition leadership
- **Cron in a cluster** - without a distributed lock every job runs on all Pods simultaneously
- **GitHub Actions self-hosted runners** - workflow-level locks to prevent parallel deployments
The Antirez vs Kleppmann Debate (2016)
In February 2016 Martin Kleppmann published 'How to do distributed locking', proving Redlock is unsafe under GC pauses and clock skew. Antirez (Redis author) responded with a detailed rebuttal. The debate ran for several weeks and drew in Kyle Kingsbury (aphyr) and others. The outcome: Redlock is safe enough for most workloads, but not for systems where correctness is critical - those require fencing tokens. The exchange became a textbook example of how to reason about safety in distributed systems.
Why Distributed Locks Exist
**2012. Stripe processes payments across multiple instances. Two Pods simultaneously receive a bank webhook for the same successful payment - and both start crediting bonuses. Double credit. 100,000 transactions per day. A local mutex does nothing: the Pods do not share memory (unlike leader election).** A mutex at the cluster level is required.
**Distributed lock** - a mechanism that guarantees only one process in the cluster executes a critical section at any given time. Unlike a local mutex, it lives in a shared store (Redis, ZooKeeper, etcd).
When a distributed lock is needed
| Scenario | Problem without a lock | Solution |
|---|---|---|
| Payment processing | Double debit/credit on race condition | Lock on transaction ID |
| Cron jobs in a cluster | 5 Pods run the same job simultaneously | Lock on job name before starting |
| External API with rate limit | Combined cluster rate exceeds API limit | Lock or distributed rate limiter |
| Leader election | Multiple services believe they are the leader | Whoever holds the lock is the leader |
**Key rule:** a distributed lock is the last resort, not the first choice. Check first: is the operation idempotent? Can CRDT or optimistic locking be used? Need a single writer - consider leader election. A lock adds latency, a potential single point of failure, and complexity.
Distributed locks work like a regular mutex - acquire, do work, release, everything is safe
In a distributed system, a GC pause, network partition, or crash can occur between acquiring the lock and using it - another process may acquire the same lock in that window
A local mutex lives in process memory - there is no gap between checking and acquiring. Over a network every step is a separate request with potential delay. That is why additional mechanisms (fencing tokens, TTL, unique identifiers) are needed to detect a stale holder.
Two Pods simultaneously process the same payment webhook event. Which mechanism correctly prevents double processing?
Redis Lock: From Naive to Safe
**Redis is the first choice for distributed locks in 90% of cases.** The command `SET key value NX PX ttl` is atomic: it writes a value only if the key does not exist, with automatic expiry. One round-trip instead of a transaction. But the naive implementation contains a dangerous bug.
**GC pause attack scenario:** Process A acquires the lock. The JVM performs a GC pause for 2 seconds. TTL expires. Process B acquires the same lock. Process A exits the pause, believes it still holds the lock, and on `release()` deletes Process B's lock. Now both processes are in the critical section simultaneously.
Safe implementation: unique token
**Why a Lua script?** GET + DEL are two separate requests. Between them another process can acquire the lock and the TTL can expire at just the right moment. A Lua script executes atomically as a single Redis command.
Why use a Lua script for release() instead of sequential GET + DEL?
Redlock: Quorum Across 5 Nodes
**A single Redis node is a single point of failure.** If it goes down or performs an AOF fsync, the lock is unavailable. Antirez (Redis author) proposed Redlock in 2013 - a locking algorithm across N independent Redis nodes. Acquisition succeeds only if the lock is granted by a majority (N/2 + 1) of nodes within a time window shorter than the TTL.
| Step | Action | Continue if |
|---|---|---|
| 1 | Record startTime = Date.now() | Always |
| 2 | SET NX sequentially on each node with a short per-node timeout | Collect all responses |
| 3 | Count successCount and elapsed = Date.now() - startTime | Always |
| 4 | Check: successCount >= N/2+1 AND ttl - elapsed > clockDrift | Lock acquired - proceed |
| 5 | If condition fails - DEL on all nodes | No lock - retry or return error |
**Martin Kleppmann's critique (2016):** Redlock is unsafe under GC pauses and clock skew. Scenario: Process A acquires Redlock, enters a GC pause for 10 seconds, TTL expires, Process B acquires the same Redlock, Process A resumes - both are in the critical section. Antirez responded in detail but the debate remains open. For financial systems use fencing tokens.
Redlock on 5 nodes is fully safe - even if 2 nodes fail the majority holds the lock
Redlock is safe against node failures but unsafe against GC pauses and long delays - a GC pause can happen after a successful acquisition
The problem is not how many nodes hold the lock but that an arbitrary amount of time can pass between acquiring the lock and actually using the resource (GC, CPU scheduling). During that time TTL can expire on all nodes and another client acquires the same lock. Fencing tokens address this independently of the locking mechanism.
Redlock on 5 nodes: 2 out of 5 acquired. What happens?
Fencing Tokens and ZooKeeper
**Martin Kleppmann proposed fencing tokens in 2016 as the only way to guarantee correctness under GC pauses.** The idea: the lock server issues a monotonically increasing number on each acquisition. The resource rejects requests with a stale number. Process A acquires the lock with token=33, freezes, token=34 is issued to Process B, A resumes - the resource sees 33 < 34 and rejects A's request.
ZooKeeper: ephemeral sequential nodes
ZooKeeper provides locking primitives at the consensus level (ZAB protocol). A client creates an **ephemeral sequential node** - ZooKeeper automatically appends a monotonic suffix (lock-0000001, lock-0000002). The client with the smallest suffix holds the lock. When a session is lost the ephemeral node is deleted automatically.
| Mechanism | Reliability | Latency | Fencing tokens | When to use |
|---|---|---|---|---|
| Redis single node | Low (SPOF) | < 1 ms | No (must implement) | Dev/staging, non-critical tasks |
| Redlock (5 nodes) | Medium (quorum) | 3-10 ms | No | High availability, non-financial |
| ZooKeeper | High (ZAB) | 5-20 ms | Yes (sequential nodes) | Cluster coordination, leader election |
| etcd (Raft) | High (Raft) | 5-15 ms | Yes (revision) | Kubernetes environments, service mesh |
What is the key advantage of ZooKeeper ephemeral sequential nodes over a Redis lock for implementing fencing tokens?
Вопросы для размышления
- A financial transaction processing system runs on 10 Pods. Identify: 1. which operations are safe with idempotency alone and require no lock 2. which need a Redis lock 3. which need ZooKeeper or etcd with fencing tokens.
Связанные уроки
- dist-07-transactions — Locks implement pessimistic transaction isolation
- dist-09-raft — Lock services are built on consensus (etcd/ZooKeeper)
- dist-12-consistency — Locks provide mutual exclusion underpinning strong consistency
- db-13-transactions