Parallel Computing
Synchronization
Цели урока
- Understand mutex and the causes of deadlock (Coffman's 4 conditions)
- Distinguish mutex from semaphore: ownership vs counting
- Apply condition variable for event-driven waiting without busy spin
- Build phase synchronization with barrier, understand load imbalance
Предварительные знания
1997. The Pathfinder rover reboots on Mars again and again. Root cause: priority inversion. A low-priority thread holds a mutex needed by a high-priority thread. A synchronization bug worth USD 265M, on another planet. NASA engineers patched it remotely. Understanding synchronization primitives literally saves missions.
- **Databases:** every transaction uses locks for ACID guarantees; deadlock detection is a standard DBMS feature
- **Operating systems:** all kernel resources (file system, network, memory) are protected by mutexes and semaphores
- **Game engines:** the render thread and physics thread synchronize via a barrier every frame
- **PyTorch DDP:** a barrier synchronizes gradients across GPUs after every backward pass
Dijkstra and Semaphores
Edsger Dijkstra invented semaphores in 1965 while building the THE operating system. The operations P (proberen = test, acquire) and V (verhogen = increment, release) are Dutch words. The same year Dijkstra formulated the Dining Philosophers problem - the classic illustration of deadlock and synchronization headaches. Both results landed in one paper, "Cooperating Sequential Processes".
Mutex: Mutual Exclusion
1997. The Pathfinder rover on Mars starts rebooting for no apparent reason. NASA engineers sit 300 million km away. Root cause: a low-priority thread holds a mutex that a high-priority thread needs. Priority inversion. A synchronization bug worth USD 265M, on another planet.
A **mutex** (mutual exclusion) is a lock that guarantees only one thread executes the critical section at a time. Picture a single-occupancy stall: door locked, wait; door open, enter and lock it behind.
**Deadlock** is the lethal failure mode of mutexes. Thread A holds lock1 and waits for lock2. Thread B holds lock2 and waits for lock1. Both wait forever. Fix: acquire mutexes in a consistent order (lock ordering), or use trylock with a timeout.
**Coffman's 4 conditions** for deadlock: 1. mutual exclusion 2. hold and wait 3. no preemption 4. circular wait. Break any one of them and deadlock cannot happen. Lock ordering breaks circular wait.
Two threads: A acquires lock1->lock2, B acquires lock2->lock1. What happens?
Semaphore: Generalized Counter
A mutex is binary: held or free. A **semaphore** generalizes the idea into a counter that admits N concurrent threads. Picture a parking lot with 5 spots: full, wait; someone leaves, drive in. Every production connection pool runs on this exact pattern.
| Type | Counter | Use case |
|---|---|---|
| Binary Semaphore | 0 or 1 | Mutex analog (but without ownership!) |
| Counting Semaphore | 0..N | Connection pool, thread pool, rate limiting |
| Mutex | 0 or 1 + owner | Only the owner can unlock |
**Key difference from mutex:** a semaphore has no owner. One thread can acquire it and another can release it. That is what makes the **producer-consumer** pattern work: the producer calls release ("produced an item"), the consumer calls acquire ("took an item").
A database connection pool has a maximum of 5 connections. 10 threads want to connect. Which primitive to use?
Condition Variable: Waiting for an Event
A mutex protects data. What about waiting for a specific data state? "Wait until the queue is non-empty". Spinning in a loop (busy waiting) torches CPU for nothing. Go runtime, Java thread pools, Rust Tokio all reach for the same primitive: a **condition variable**. Sleep, then wake when the condition holds.
**Spurious wakeups:** a thread can wake up without a notify. Always wrap wait() in `while (condition)`, never `if (condition)`. Not a bug - it is a property of the OS-level implementation (signals, scheduler).
**Pattern:** a condition variable always pairs with a mutex. The mutex protects shared state; the condition variable handles waiting for state changes. wait() **atomically** releases the mutex and puts the thread to sleep. On wakeup, the mutex is re-acquired.
Why does cond.wait() need while (condition), not if (condition)?
Barrier: Synchronization Point
Google MapReduce, Apache Spark, neural network training all run the same dance: chop work into independent chunks, compute in parallel, then **merge**. The moment when "everyone is done computing, now merge" is a barrier - a point where every thread waits for the rest before continuing.
| Pattern | Primitive | Example |
|---|---|---|
| Mutual exclusion | Mutex | Protecting a shared counter |
| Resource limiting | Semaphore | Connection pool (max 5) |
| Event waiting | Condition Variable | Producer-consumer queue |
| Phase synchronization | Barrier | MapReduce, parallel simulation |
A barrier can become the bottleneck. If one thread lags, every other thread sits idle. That is **load imbalance** - one of the headline problems in parallel systems. Fix: dynamic work distribution (work stealing). In GPU computing, CUDA's `__syncthreads()` synchronizes every thread in a block - without it, one warp can read data another warp has not yet written.
More mutexes = safer
More mutexes = higher deadlock risk and worse performance. The optimum is the minimum necessary number.
Each additional mutex increases deadlock risk (more acquisition-order combinations), contention (threads block each other more often), and overhead (every lock/unlock is an atomic operation). Rule: "Don't communicate by sharing memory; share memory by communicating" (Go proverb).
4 threads process a matrix. Threads 0-2 finish in 10ms, thread 3 takes 100ms. With a barrier for 4 threads, how long does the phase take?
Key Ideas
- **Mutex** - one thread in the critical section; deadlock is possible when lock ordering is violated
- **Semaphore** - counter for N concurrent accesses; no owner (enables producer/consumer pattern)
- **Condition Variable** - "sleep until condition is met"; always with while (spurious wakeups break if)
- **Barrier** - "wait for everyone, then continue"; phase time equals the slowest thread's time
Related Topics
Synchronization is the foundation for parallel algorithms and systems:
- Threads and Processes — Synchronization is needed because threads share memory
- Real-Time Systems — Priority inversion and priority inheritance are critical for RT systems
- Consensus in Distributed Systems — A mutex at cluster scale - same ideas, different scope
Вопросы для размышления
- Go promotes channels over mutexes: "Don't communicate by sharing memory; share memory by communicating". What is the advantage of this approach?
- Is it possible to completely avoid mutexes by using only lock-free data structures? What are the trade-offs?
- How did NASA engineers fix the priority inversion bug on Pathfinder while being on Earth with the rover on Mars?
Связанные уроки
- par-02 — Threads and shared memory - the foundation for why primitives exist
- par-04 — Lock-free structures are built on top of mutex understanding
- rts-02 — Priority inversion in RT systems is a direct consequence of mutex ownership
- ds-03 — Distributed consensus is a cluster-level mutex
- par-08 — Parallel algorithms require a sound synchronization strategy
- stream-03 — Message brokers as an alternative to shared memory via message passing
- os-05-sync
- os-03-threads