Computer Architecture

Memory Hierarchy: The Speed Pyramid

Цели урока

  • Understand the Memory Wall problem
  • Know the levels of the memory hierarchy and their characteristics
  • Understand temporal and spatial locality
  • Know the basics of how cache works

Предварительные знания

  • CPU structure
  • Concept of memory
  • CPU structure

A 4 GHz processor sits idle for 100+ cycles waiting for data from RAM. The memory hierarchy is the strategy to hide that latency.

  • Optimizing data structures for cache
  • Cache-oblivious algorithms
  • Profiling cache misses
  • Game development optimization

The Problem: Memory Wall

**Memory Wall:** CPU speed grows faster than memory speed. The gap widens every year.

ComponentLatencyRelative
Register~0.3 ns1×
L1 cache~1 ns3×
L2 cache~4 ns12×
L3 cache~12 ns40×
RAM (DDR5)~50-100 ns150-300×
SSD NVMe~20,000 ns60,000×
HDD~5,000,000 ns15,000,000×

**Cost of stalling:** If data is not in cache, the CPU stalls for 100+ cycles. At 4 GHz that's ~25 ns of wasted time!

The Memory Wall is:

Hierarchy Levels

**Solution:** A hierarchy - small fast memory + large slow memory. Use both wisely!

LevelSizeLatencyCost/GB
SRAM (cache)MB1-10 ns~$50
DRAM (RAM)GB50-100 ns~$5
Flash (SSD)TB10-100 µs~$0.10
HDDTB5-10 ms~$0.02

**Why not all SRAM?** 16 GB of SRAM would cost ~$800,000 (vs $50 for DRAM). Economics dictates the hierarchy!

Why is L1 cache smaller than L3?

Principles of Locality

The hierarchy works thanks to **locality** - patterns in how programs access data.

**Temporal Locality:** If data was accessed recently, it will likely be accessed again soon.

**Spatial Locality:** If data was accessed, nearby data will likely be accessed soon.

PatternTemporalSpatialCache-friendly?
Sequential traversalLowHighYes!
Loop on one variableHighLowYes!
Random accessLowLowNo!
Matrix traversal by rowsLowHighYes!
Matrix traversal by columnsLowLowNo!

**Cache thrashing:** Random access to a large array kills performance. Every access is a cache miss!

An example of spatial locality:

How Cache Works

**Cache** - a fast copy of frequently-used data from RAM.

**Cache Line:** The cache loads not a single byte but a block (typically 64 bytes). This exploits spatial locality!

**Hit Rate:** A typical L1 hit rate is 95%+. This means only 5% of accesses go to L2.

Cache simply speeds up all memory accesses

Cache only speeds up repeated accesses (temporal) and neighboring data (spatial).

With random access, every access is a cache miss. The cache doesn't help - it only adds the overhead of checking.

If cache line = 64 bytes and int = 4 bytes, how many ints are loaded on a single cache miss?

Key Ideas

  • Memory Wall: CPU is 100-300× faster than RAM
  • Hierarchy: registers → L1 → L2 → L3 → RAM → SSD
  • Smaller = faster, but more expensive
  • Temporal locality: reuse of the same data
  • Spatial locality: access to neighboring data
  • Cache line = 64 bytes, loaded as a whole

Related Topics

Memory hierarchy is the foundation for understanding performance.

  • Cache memory — Detailed internals of cache
  • Virtual memory — Abstraction over physical memory

Вопросы для размышления

  • How does data locality influence the choice of data structures in high-performance systems?
  • Why can a single cache miss cost more than hundreds of arithmetic operations?
  • What programming techniques maximize cache hits when traversing two-dimensional arrays?

Связанные уроки

  • os-01-intro
Memory Hierarchy: The Speed Pyramid

0

1

Sign In