Compilers
JIT compilation: the basics
JavaScript runs faster than C++ after warmup. It sounds absurd, but real benchmarks show it. V8 watches how code executes, collects argument types, branch targets, and call frequencies, and generates native x86/ARM code targeted at that specific pattern. This is Just-In-Time compilation: the compiler knows something a static compiler never can - how the program actually behaves.
- **V8 in Node.js** compiles hot Express.js middleware through TurboFan. Typical JSON parsing speeds up 5-10x after a warmup of around 1000 requests
- **HotSpot JVM** in a high-performance Java server (a Kafka broker, say) reaches peak performance 30-60 seconds after startup. That is the time the C2 compiler needs to optimize every hot path
- **LuaJIT** powers game engines (Roblox, WoW addons) and nginx-lua. Its tracing JIT delivers 10-50x speedups over plain Lua for game logic with numeric loops
Tracing JIT
A tracing JIT records the program's execution path - a trace - as a linear sequence of instructions that crosses function boundaries and branches. When a trace reaches the hotness threshold, it is compiled to native code. LuaJIT 2.x is the classic example: for a hot loop trace, it generates dense native code with no function call overhead. Mike Pall built LuaJIT around this approach and reached performance competitive with C.
PyPy uses an RPython-based tracing JIT: Python programs running tight numeric loops speed up 5-50x compared to CPython. Firefox SpiderMonkey used a tracing JIT (TraceMonkey) until 2012 and then switched to a method JIT (IonMonkey). Tracing delivers excellent peak performance but is unstable for polymorphic call sites.
What is a 'trace' in the context of a tracing JIT?
Method JIT
A method JIT compiles whole methods (functions) as the unit of compilation. This is the approach used by HotSpot JVM, V8 TurboFan, and SpiderMonkey IonMonkey. The compiler builds an SSA form (Static Single Assignment) of the method, applies optimizations (inlining, loop unrolling, escape analysis), and generates native code. The compilation boundary is the method boundary, which makes reasoning about optimizations simpler.
V8 uses Sparkplug (a fast baseline JIT without optimizations) plus Maglev (a mid-tier compiler) plus TurboFan (the optimizing JIT). TurboFan applies around 50 optimization passes including inlining, dead code elimination, and range analysis. TurboFan compilation takes 1-10ms per method, so it cannot be applied to every function.
Why does a method JIT use the SSA (Static Single Assignment) form?
JIT profiling
A JIT compiler cannot afford to compile everything; that would be too slow. Instead, the runtime collects a profile: how many times each function has been called, which types are actually passed as arguments, which branches are taken more often. The JIT uses that data to decide on inlining, type specialization, and probabilistic optimizations.
HotSpot JVM uses sample-based profiling: every 10ms the JVM samples the call stack and marks the hot methods. V8 uses event-based profiling: it counts every function call and every back-edge in a loop. PyPy collects the object type on every LOAD_ATTR instruction. Profile accuracy directly affects the quality of JIT optimizations.
What is an Inline Cache (IC) in the context of JIT profiling?
Tiered compilation
Tiered compilation resolves the JIT trade-off: fast startup vs peak performance. A program begins execution in an interpreter or a fast baseline compiler. Hot code moves to the next tier with more aggressive optimizations. V8 has four tiers: Ignition (interpreter) -> Sparkplug (baseline JIT) -> Maglev -> TurboFan. HotSpot JVM has five tiers (C1/C2).
Node.js 18+ uses the full V8 pipeline including Maglev. Warmup time is a critical parameter for serverless: an AWS Lambda Node.js cold start of ~200ms includes the time before TurboFan optimizes the first requests. GraalVM Native Image solves this differently: AOT compilation of the whole application removes warmup entirely.
A JIT compiler is always faster than an interpreter
JIT is only worthwhile for hot code. For rarely executed functions the compilation overhead outweighs the speedup
If a function is called three times during the whole lifecycle of an application, TurboFan would spend 10ms compiling it to save a few microseconds. Tiered compilation solves this by keeping rare code in the interpreter
Why does TurboFan (tier 4 in V8) not compile all code immediately at startup?
Key ideas
- Tracing JIT (LuaJIT, PyPy) records a concrete execution path and compiles it. Great for numeric loops, unstable under polymorphism
- Method JIT (V8 TurboFan, HotSpot C2) compiles whole functions through a full optimization pipeline based on SSA
- Tiered compilation (V8: Ignition -> Sparkplug -> Maglev -> TurboFan) balances cold start latency against peak throughput. The interpreter starts instantly, and the optimizing JIT kicks in as code warms up
Related topics
JIT compilation builds on compiler techniques and ties into runtime systems:
- Speculative optimizations — Tiered JITs rely on speculative assumptions about types. When an assumption breaks, the runtime deoptimizes
- GraalVM — GraalVM is a JIT written in Java (the Graal compiler) plus AOT compilation through Native Image
- LLVM — Some JITs (MCJIT, ORC JIT in LLVM) use LLVM as the backend for code generation
Вопросы для размышления
- Tracing JITs are great for numeric loops but struggle with polymorphic calls. Why? What happens when a trace encounters a new type?
- Node.js serverless functions suffer from JIT warmup: the first requests are slow. Beyond GraalVM Native Image, what strategies help?
- V8 added Maglev (2023) between Sparkplug and TurboFan. Why is a mid-tier compiler useful? What does it gain over jumping straight to TurboFan?