Global Optimisations
GCC with -O3 applies 50+ optimisation passes to the code. Local optimisations work inside one block, but global ones see the whole function graph and the inter-procedural graph. That is how LLVM can speed up NumPy by 2-5x just by changing compile flags, with no code changes.
- **LLVM with Profile-Guided Optimization (PGO)** uses an actual execution profile for LICM and inlining: hot functions get inlined aggressively, cold ones do not. Chrome is built with PGO and gains 5-10% over -O2.
- **Rust** uses alias analysis implicitly through the borrow checker. The compiler knows that a mutable reference is unique and applies optimisations that are off-limits for C with raw pointers. LLVM receives `noalias` annotations for every `&mut` parameter.
- **MLGO (Machine Learning Guided Optimisation)** in LLVM is a neural network that makes inlining decisions. Google published the results: +1.5% Chrome throughput, -0.5% binary size. The model is trained on a corpus of production code.
Dataflow Analysis
Dataflow analysis is a family of algorithms that gather information about variable values along the paths of a CFG. The information propagates from block to block through transfer functions until it reaches a fixed point.
Convergence of dataflow analysis is guaranteed if the lattice is finite and the transfer functions are monotone. LLVM implements dataflow infrastructure in `DataFlowSanitizer` and in individual passes. GCC has `df_analyze()`, a single framework for reaching definitions, live variables, and other analyses. Convergence time: O(N * D) iterations, where D is the loop nesting depth.
How does forward dataflow analysis differ from backward?
Loop Invariant Code Motion
Loop Invariant Code Motion (LICM) hoists computations out of a loop when their result does not change between iterations. If an expression inside the loop does not depend on any variables modified in the loop, it can be computed once before the loop.
LICM in LLVM (the `LICM` pass) uses alias analysis to determine whether the loop body modifies the value read by a potential invariant. If `sin(angle)` could alias modifiable memory, LICM will not hoist it. In practice this is rare. The JVM HotSpot JIT applies LICM particularly aggressively. Even `arr.length` gets hoisted out of a loop as an invariant once the reference is proven not to change.
Why can't accesses to `volatile` variables be hoisted out of a loop?
Function Inlining
Function inlining substitutes a function body at the call site. It removes call overhead (push args, call, pop), but more importantly it opens the door to other optimisations: constant propagation across function boundaries, LICM, and CSE.
Since 2020 LLVM has supported MLGO (Machine Learning Guided Optimisation) for inlining: a neural network makes the decision for each call based on 47 features (function size, number of users, profiling data). Google reports a 1.5% speedup for Chrome and a 0.5% reduction in binary size compared with heuristic inlining. Rust uses `#[inline]`, `#[inline(always)]`, `#[inline(never)]` for explicit control.
Why does aggressive inlining sometimes slow a program down instead of speeding it up?
Alias Analysis
Alias analysis determines whether two pointers can refer to the same memory location. This is critical for optimisations: you cannot reorder a load and a store if they might touch the same memory. Conservative analysis assumes aliasing everywhere and blocks optimisations.
The strict aliasing rule in C/C++: pointers of different types cannot alias (with the exception of char*). GCC at `-O2` uses this for TBAA optimisations. A famous bug: `*(int*)float_ptr` is undefined behaviour by strict aliasing, the compiler may optimise assuming no aliasing. The correct way: `memcpy` or `__attribute__((may_alias))`. Rust has no C-style aliasing problem. The borrow checker guarantees the absence of mutable aliasing statically.
Why does C99 `restrict` speed up compute-heavy code?
Key ideas
- **Dataflow analysis** propagates value information along the CFG until a fixed point. Forward (reaching defs) and backward (live vars) are the two main directions.
- **LICM** hoists loop-invariant computations out of the body. It needs to prove every operand is invariant. `volatile` is an absolute prohibition.
- **Inlining** substitutes a function body and unlocks cross-function optimisations. Aggressive inlining can cause code bloat and a performance drop.
- **Alias analysis** determines whether pointers can touch the same memory. Conservative analysis blocks optimisations. `restrict`, TBAA, and Rust's borrow checker are ways to give the compiler more information.
Related topics
Global optimisations work on IR and open opportunities for later phases:
- Local optimisations — Global optimisations often unlock another pass of local ones
- Loop optimisations — LICM is part of the loop optimisation family; dataflow analysis powers loop analysis
- Register allocation — Live variable analysis (dataflow) is the input for the register allocator
Вопросы для размышления
- Dataflow analysis converges to a fixed point. Could there be a program where it never converges? What guarantees termination?
- The Rust borrow checker statically guarantees no mutable aliasing. That lets LLVM apply `noalias` to every `&mut`. How much does that speed up code relative to equivalent C code without `restrict`?
- PGO (Profile-Guided Optimisation) requires running the program, collecting a profile, then rebuilding. What situations make PGO impractical or unreliable?