Parallel Computing

Synchronization

Цели урока

  • Understand mutex and the causes of deadlock (Coffman's 4 conditions)
  • Distinguish mutex from semaphore: ownership vs counting
  • Apply condition variable for event-driven waiting without busy spin
  • Build phase synchronization with barrier, understand load imbalance

Предварительные знания

  • Threads and Processes

1997. The Pathfinder rover reboots on Mars again and again. Root cause: priority inversion. A low-priority thread holds a mutex needed by a high-priority thread. A synchronization bug worth USD 265M, on another planet. NASA engineers patched it remotely. Understanding synchronization primitives literally saves missions.

  • **Databases:** every transaction uses locks for ACID guarantees; deadlock detection is a standard DBMS feature
  • **Operating systems:** all kernel resources (file system, network, memory) are protected by mutexes and semaphores
  • **Game engines:** the render thread and physics thread synchronize via a barrier every frame
  • **PyTorch DDP:** a barrier synchronizes gradients across GPUs after every backward pass

Dijkstra and Semaphores

Edsger Dijkstra invented semaphores in 1965 while building the THE operating system. The operations P (proberen = test, acquire) and V (verhogen = increment, release) are Dutch words. The same year Dijkstra formulated the Dining Philosophers problem - the classic illustration of deadlock and synchronization headaches. Both results landed in one paper, "Cooperating Sequential Processes".

Mutex: Mutual Exclusion

1997. The Pathfinder rover on Mars starts rebooting for no apparent reason. NASA engineers sit 300 million km away. Root cause: a low-priority thread holds a mutex that a high-priority thread needs. Priority inversion. A synchronization bug worth USD 265M, on another planet.

A **mutex** (mutual exclusion) is a lock that guarantees only one thread executes the critical section at a time. Picture a single-occupancy stall: door locked, wait; door open, enter and lock it behind.

**Deadlock** is the lethal failure mode of mutexes. Thread A holds lock1 and waits for lock2. Thread B holds lock2 and waits for lock1. Both wait forever. Fix: acquire mutexes in a consistent order (lock ordering), or use trylock with a timeout.

**Coffman's 4 conditions** for deadlock: 1. mutual exclusion 2. hold and wait 3. no preemption 4. circular wait. Break any one of them and deadlock cannot happen. Lock ordering breaks circular wait.

Two threads: A acquires lock1->lock2, B acquires lock2->lock1. What happens?

Semaphore: Generalized Counter

A mutex is binary: held or free. A **semaphore** generalizes the idea into a counter that admits N concurrent threads. Picture a parking lot with 5 spots: full, wait; someone leaves, drive in. Every production connection pool runs on this exact pattern.

TypeCounterUse case
Binary Semaphore0 or 1Mutex analog (but without ownership!)
Counting Semaphore0..NConnection pool, thread pool, rate limiting
Mutex0 or 1 + ownerOnly the owner can unlock

**Key difference from mutex:** a semaphore has no owner. One thread can acquire it and another can release it. That is what makes the **producer-consumer** pattern work: the producer calls release ("produced an item"), the consumer calls acquire ("took an item").

A database connection pool has a maximum of 5 connections. 10 threads want to connect. Which primitive to use?

Condition Variable: Waiting for an Event

A mutex protects data. What about waiting for a specific data state? "Wait until the queue is non-empty". Spinning in a loop (busy waiting) torches CPU for nothing. Go runtime, Java thread pools, Rust Tokio all reach for the same primitive: a **condition variable**. Sleep, then wake when the condition holds.

**Spurious wakeups:** a thread can wake up without a notify. Always wrap wait() in `while (condition)`, never `if (condition)`. Not a bug - it is a property of the OS-level implementation (signals, scheduler).

**Pattern:** a condition variable always pairs with a mutex. The mutex protects shared state; the condition variable handles waiting for state changes. wait() **atomically** releases the mutex and puts the thread to sleep. On wakeup, the mutex is re-acquired.

Why does cond.wait() need while (condition), not if (condition)?

Barrier: Synchronization Point

Google MapReduce, Apache Spark, neural network training all run the same dance: chop work into independent chunks, compute in parallel, then **merge**. The moment when "everyone is done computing, now merge" is a barrier - a point where every thread waits for the rest before continuing.

PatternPrimitiveExample
Mutual exclusionMutexProtecting a shared counter
Resource limitingSemaphoreConnection pool (max 5)
Event waitingCondition VariableProducer-consumer queue
Phase synchronizationBarrierMapReduce, parallel simulation

A barrier can become the bottleneck. If one thread lags, every other thread sits idle. That is **load imbalance** - one of the headline problems in parallel systems. Fix: dynamic work distribution (work stealing). In GPU computing, CUDA's `__syncthreads()` synchronizes every thread in a block - without it, one warp can read data another warp has not yet written.

More mutexes = safer

More mutexes = higher deadlock risk and worse performance. The optimum is the minimum necessary number.

Each additional mutex increases deadlock risk (more acquisition-order combinations), contention (threads block each other more often), and overhead (every lock/unlock is an atomic operation). Rule: "Don't communicate by sharing memory; share memory by communicating" (Go proverb).

4 threads process a matrix. Threads 0-2 finish in 10ms, thread 3 takes 100ms. With a barrier for 4 threads, how long does the phase take?

Key Ideas

  • **Mutex** - one thread in the critical section; deadlock is possible when lock ordering is violated
  • **Semaphore** - counter for N concurrent accesses; no owner (enables producer/consumer pattern)
  • **Condition Variable** - "sleep until condition is met"; always with while (spurious wakeups break if)
  • **Barrier** - "wait for everyone, then continue"; phase time equals the slowest thread's time

Related Topics

Synchronization is the foundation for parallel algorithms and systems:

  • Threads and Processes — Synchronization is needed because threads share memory
  • Real-Time Systems — Priority inversion and priority inheritance are critical for RT systems
  • Consensus in Distributed Systems — A mutex at cluster scale - same ideas, different scope

Вопросы для размышления

  • Go promotes channels over mutexes: "Don't communicate by sharing memory; share memory by communicating". What is the advantage of this approach?
  • Is it possible to completely avoid mutexes by using only lock-free data structures? What are the trade-offs?
  • How did NASA engineers fix the priority inversion bug on Pathfinder while being on Earth with the rover on Mars?

Связанные уроки

  • par-02 — Threads and shared memory - the foundation for why primitives exist
  • par-04 — Lock-free structures are built on top of mutex understanding
  • rts-02 — Priority inversion in RT systems is a direct consequence of mutex ownership
  • ds-03 — Distributed consensus is a cluster-level mutex
  • par-08 — Parallel algorithms require a sound synchronization strategy
  • stream-03 — Message brokers as an alternative to shared memory via message passing
  • os-05-sync
  • os-03-threads
Synchronization

0

1

Sign In