Computer Architecture

Multicore Processors: Cache Coherence

You parallelized your code across 8 cores and expected 8× speedup. Reality: 1.2×. The cause - two threads happen to write to variables that sit next to each other in memory. Cache coherence turns parallelism into a nightmare if you don't know about it.

  • False sharing is a common cause of multi-threaded server slowdowns
  • MESI protocol is in every modern x86/ARM processor
  • Java ConcurrentHashMap uses separate cache lines per bucket
  • Linux kernel: per-CPU variables are explicitly aligned to cache lines

The Cache Coherence Problem

**Scenario:** Core 0 cached variable x=5. Core 1 changed x to 10 in its own cache. Now Core 0 reads x - and sees 5, even though it should be 10. This is the **cache coherence problem**.

**Memory Ordering:** Even if the cache is coherent, the CPU and compiler may reorder instructions. You need explicit memory barriers (memory fences) or atomic operations with memory ordering.

Core A wrote x=10. Core B reads x and sees 5. What is the cause?

MESI: A Cache Coherence Protocol

**MESI** is a cache coherence protocol. Each cache line (typically 64 bytes) is in one of four states: **M**odified, **E**xclusive, **S**hared, **I**nvalid.

**Snooping vs Directory:** Snooping (bus listening) works on processors with 2–16 cores. Directory-based coherence is used in NUMA systems with hundreds of cores - each memory region has a directory tracking which caches hold copies.

Core 0 has a cache line in state S. Core 0 wants to write to it. What happens?

False Sharing: The Hidden Performance Killer

**False sharing** occurs when two cores write to different variables that happen to reside in the same cache line (64 bytes). MESI invalidates the entire line on every write, even though the cores are writing to different data.

**In practice:** perf stat / cachegrind show cache misses. Intel VTune visualizes false sharing. In Java: the @Contended annotation automatically adds padding. In Rust: #[repr(align(64))].

If two threads write to different variables, they do not interfere. False sharing is about the same word or field, not about different data.

False sharing happens precisely because variables are different but reside in the same cache line (typically 64 bytes). MESI invalidates the whole line on every write, and independent threads queue up for cache-line ping-pong - throughput drops by 10-100x.

The "different variables equals independent operations" model is correct at the language level, but caches work in lines, not bytes. Two counters declared back-to-back in a struct land in one line and block each other. Padding to 64 bytes or @Contended in Java / cache-line-aligned in C++ is not an optimization but a precondition for scalable multicore code.

Two threads write to different variables but performance is terrible. What's the likely cause?

Key Takeaways

  • Cache coherence: all cores must see the same value for any variable
  • MESI: 4 cache line states - Modified, Exclusive, Shared, Invalid
  • Writing to a Shared line → invalidation of all other copies
  • False sharing: writing to different variables in the same cache line - hidden bottleneck
  • Fix: align hot variables to 64-byte cache line boundaries

Related Topics

Cache coherence is the foundation of parallel programming.

  • Cache — MESI governs the state of every cache line
  • ARM vs x86 — ARM has a weaker memory model than x86

Вопросы для размышления

  • Why does NUMA architecture make the coherence problem harder than UMA?
  • How do atomic operations (CAS) use the MESI protocol?
  • Why is volatile in Java/C++ insufficient for correct multithreading?

Связанные уроки

  • arch-09-cache — Cache is the key resource that must stay coherent across cores
  • arch-06-pipelining — Per-core superscalar execution provides context for understanding NUMA
  • arch-15-gpu-architecture — GPU is a different massive-parallelism model with different trade-offs
  • alg-01-big-o — Amdahl's Law is Big O for parallel speedup
  • os-05-sync
Multicore Processors: Cache Coherence

0

1

Sign In