Compilers

Register Allocation

x86-64 has 16 general-purpose registers. A typical function may have 100+ virtual registers. The allocation task: pack 100 into 16 without losing correctness. That is graph colouring, NP-complete in general. Yet compilers solve it in a fraction of a second. How?

**GCC** has used Chaitin-Briggs graph colouring since 1993. On x86-64 with 16 registers, the graph rarely needs spilling in well-written C code, but C++ with many local objects can easily overload the allocator.
**LLVM Greedy Register Allocator** (the default) blends linear scan with local spill/split heuristics. Compared with pure graph colouring on SPEC CPU2017: 2% worse runtime, but 3x faster compilation.
**V8 TurboFan** uses linear scan to JIT hot JavaScript functions. The decision is made in under 1ms per function, otherwise JIT overhead would outweigh the optimisation gain.

Graph Coloring

Register allocation by graph colouring is the classic approach by Chaitin (1982). The interference graph is built with virtual registers as nodes and edges between pairs that are live at the same time (interfering). The problem: colour the graph with K colours (K = the number of physical registers) so that adjacent nodes get different colours.

Graph colouring is NP-complete in general. In practice, however, interference graphs have a special structure (chordal graphs for some programs), which makes colouring polynomial. GCC and LLVM rely on heuristics: simplification ordering and a spill cost based on loop depth. The modern LLVM allocator (PBQP, Greedy) additionally uses a machine model to estimate spill cost.

What does 'interference' between two virtual registers mean?

Liveness Analysis

Liveness analysis is a backward dataflow analysis that determines, at every point of the program, which variables are 'live' (their value may be used in the future). A variable is live from its definition until its last use.

LLVM computes liveness through the `LiveVariables` and `LiveIntervals` passes. `LiveIntervals` builds intervals [def, last-use] for each virtual register, which is more compact than a bitset. GCC uses `df_analyze()`. Liveness information is used not only for register allocation but also for dead store elimination and copy propagation.

Why is liveness analysis backward (from end to start) rather than forward?

Linear Scan

Linear Scan Register Allocation is a linear-time allocator running in O(N log N). The flow: sort live intervals by start point, greedily assign registers, and on shortage spill the lowest-priority interval. It is used where compile speed matters more than code quality: JIT compilers.

HotSpot JVM used linear scan for JIT-compiling Java because compile speed is critical. LLVM ships `LinearScanRegisterAllocator` as an alternative to the Greedy allocator. V8 TurboFan uses linear scan. The modern variant, SSA-based linear scan (Wimmer & Franz, 2010), works directly on SSA form and reaches quality close to graph colouring while staying linear.

Why do JIT compilers prefer linear scan over graph colouring?

Spilling

Spilling evicts a value from a register into the stack (stack frame) when there are not enough physical registers. A spill = one store + a load on every subsequent use. It hurts performance: memory access is 10-100x more expensive than a register.

Rematerialisation is an alternative to spilling: instead of saving a value in memory, recompute it on demand. It fits cheap computations: `x = 0` (xor self) or `x = small_constant`. LLVM checks `isReMaterializable()` for each instruction class. On RISC-V with its 32 registers, spilling occurs less often than on x86-32 with its 8 registers.

Why is a spill inside a loop especially expensive?

Summary

**Graph colouring**: virtual registers = nodes, interference = edges, K colours = K physical registers. NP-complete in general but practically solvable in polynomial time.
**Liveness analysis**: backward dataflow that tells you where each variable is live. The basis for building the interference graph.
**Linear scan**: O(N log N) greedy algorithm. Fast, slightly worse quality. The pick of JIT compilers (V8, HotSpot).
**Spilling**: evict to the stack when out of registers. Especially expensive in loops. Rematerialisation is an alternative for cheap expressions.

Вопросы для размышления

x86-64 has 16 GPRs, but some are reserved (rsp, rbp). ARM64 has 31 GPRs. How does having more registers affect spill frequency, and why do x86 and ARM64 differ in spill rates for the same code?
Register coalescing merges two registers in `a = b` into one physical register. That conflicts with graph colouring (it adds a constraint). How is this trade-off resolved?
JVM managed code needs a GC roots map. The compiler must know exactly where in registers the object references sit. How does that complicate register allocation compared with native languages?

Связанные уроки

plt-17-memory-models