Code Generation
The same algorithm written the same way can run 10x faster or slower depending on how smart the compiler backend is. Register allocation, instruction selection, and scheduling turn abstract IR into fast native code.
- **Rust release builds**: LLVM -O3 with LTO produces code competitive with optimized C. Without it, Rust debug builds are 10-50x slower than release
- **Wasmtime**: Cranelift compiles WebAssembly in around 100 ms. That makes serverless cold start acceptable. LLVM would need 1-2 s
- **Apple M1 SoC**: the ARM backend of LLVM is tuned by Apple for the M1. The result: Clang on the M1 compiles LLVM faster than an Intel Xeon costing 2x more
Register Allocation
Register allocation is an NP-complete graph coloring problem. Nodes are virtual registers (SSA variables) and edges are conflicts (two variables alive at the same time). The goal is to color the graph in k colors (physical registers) with as few memory spills as possible.
Linear scan register allocation is a faster algorithm, O(n log n) vs Chaitin-Briggs O(n^2). JIT compilers (JVM, V8) use linear scan because compile time matters. AOT (LLVM) can afford the slower but higher-quality Chaitin-Briggs.
What happens when there are more variables than physical registers?
Instruction Selection
Instruction selection maps IR nodes to instructions of the target architecture. One IR instruction can map to different machine instructions with different costs. Tree pattern matching via dynamic programming yields an optimal covering.
Why is instruction selection an important optimization?
LLVM Backend
LLVM (Low Level Virtual Machine) is a modular compiler infrastructure. The frontend emits LLVM IR; LLVM optimizes it and generates native code for 30+ architectures. Rust, Swift, C++ (Clang), Julia, Kotlin Native: all built on LLVM.
LLVM IR is typed and in SSA form. The nsw/nuw flags on arithmetic are undefined-behavior hints that unlock further optimizations. Rust uses this: signed integer overflow is UB (as in C), so LLVM can optimize more aggressively.
What is the main advantage of the LLVM architecture for creating new languages?
Cranelift
Cranelift is a newer code generator written in Rust, used in Wasmtime and as a rustc backend. It is designed as an alternative to LLVM that focuses on compilation speed (for JIT) and correctness (for WebAssembly runtimes). It uses a verifiable register allocator (regalloc2).
LLVM vs Cranelift: LLVM has 60+ optimization passes, the best output code, and slow compilation. Cranelift has minimal optimizations, about 10x faster compilation, and worse output code. The choice: AOT -> LLVM, JIT/debug -> Cranelift.
Code generation is just a one-to-one translation from operations into processor instructions
Codegen includes register allocation (NP-complete), instruction selection (tree pattern matching), and instruction scheduling (latency hiding). Each stage has a significant impact on performance
A naive translation from LLVM IR to x86 with no register allocation would require memory for every variable. A good register allocator saves 3 to 5x in speed on number-crunching
Why is Cranelift a better fit for JIT compilation than LLVM?
Takeaways
- **Register allocation**: NP-complete coloring of the interference graph. Spill to memory when registers run out. Linear scan for JIT, Chaitin-Briggs for AOT
- **Instruction selection**: one IR operation, the best machine instructions. LEA instead of imul+add. Peephole optimizations
- **LLVM**: modular infrastructure. Frontend -> LLVM IR -> 30+ targets + 60 optimization passes. The basis of Rust, Swift, Clang, Julia
- **Cranelift**: about 10x faster than LLVM, with fewer optimizations. Ideal for JIT (Wasmtime) and debug builds
Related Topics
Codegen is the final stage of the compiler:
- IR and optimizations — Optimized IR is the input to codegen
- JIT compilation — JIT is codegen at runtime, balancing compile speed against code quality
Вопросы для размышления
- Register allocation is NP-complete, yet compilers solve it in seconds. Which heuristics and simplifications make this tractable in practice?
- LLVM IR has nsw/nuw flags (no signed/unsigned wrap), which let optimizations exploit undefined behavior. How does the balance between optimizability and correctness shape language design?
- Apple uses LLVM for all Apple Silicon code. How deeply can a company customize the backend for its own microarchitecture?