Compilers

Bytecode and Virtual Machines

The JVM runs the same .class file on Windows, Linux, macOS, x86-64, and ARM64 with no recompilation. A Python script runs anywhere an interpreter exists. Bytecode is an abstraction over hardware: compile once into portable bytecode, then run everywhere. That is exactly what made Java 'write once, run anywhere'.

  • **CPython 3.11** got 25% faster thanks to specialised bytecode instructions (Faster CPython). Instead of a single BINARY_ADD, there are separate BINARY_OP_ADD_INT, BINARY_OP_ADD_FLOAT. Adaptive specialisation based on observed types.
  • **WebAssembly** is the new universal bytecode for the web and beyond. Wasmtime and wasmer are standalone WASM runtimes. WASI (WebAssembly System Interface) lets WASM run outside the browser as a secure sandbox.
  • **Lua 5** picked a register-based VM and beat CPython by about 3x on number-crunching benchmarks. LuaJIT (Mike Pall) added JIT compilation and reaches performance comparable to C on numeric tasks.

Stack Machine

A stack machine is a virtual machine that uses a stack to hold intermediate values. Instructions work implicitly with the top of the stack: `IADD` pops the top two values, adds them, and pushes the result. Operands are not specified explicitly.

WebAssembly is a stack machine with an explicit stack and structured control flow (no goto). Compact binary format: `0x6a` = i32.add, `0x20 0x00` = local.get $0. Average WASM uses 10-15 bytes per instruction concept against 25+ for x86. Stack machines are easier to verify: the stack type is tracked during validation, and ill-formed programs are rejected before execution.

What is the order of operands when `ISUB` runs on a stack machine with stack `[5, 3]` (3 on top)?

Register-Based VM

A register-based VM uses explicit virtual registers instead of a stack. Instructions name their sources and destination explicitly: `ADD R1, R2, R3`. Fewer instructions per computation, but each instruction is wider (3 operands instead of 0).

CPython moved to specialised bytecode in 3.11 (Faster CPython project): instead of BINARY_ADD, there are now BINARY_OP_ADD_INT, BINARY_OP_ADD_FLOAT, type-specialised variants. This is adaptive interpretation: the first few executions identify the types and then replace the generic instruction with a specialised one. CPython 3.11 is 25% faster than 3.10 thanks to this.

Why did Lua and Dalvik choose a register-based VM over a stack-based one?

Bytecode Design

Bytecode design affects interpreter performance, file size, ease of verification, and JIT compilation. Key decisions: instruction width (fixed vs variable), opcode count, linear vs structured control flow.

WASM uses LEB128 (Little Endian Base 128) for variable-length integer encoding: small numbers (0-127) take 1 byte, larger numbers take 2-5 bytes. That is almost always more compact than a fixed 4-byte int. The same approach is used in Google Protocol Buffers and the DWARF debug format. LEB128 decoding is slightly slower, but the bandwidth/memory savings are worth it for network-delivered WASM.

Why does the JVM have separate opcodes `iadd`, `fadd`, `ladd` instead of one `add`?

Interpreter Dispatch

Dispatch is the mechanism that selects the handler for each opcode in the interpreter. It is the hot path: every instruction goes through dispatch. Three main approaches: switch dispatch, direct threading, and indirect threading.

Threaded code was first used in Forth (1970), effectively direct threading with an array of pointers. CPython used switch dispatch up to 3.6. From 3.6 it added support for computed goto on GCC/Clang (the `--with-computed-gotos` flag). The speedup is around 15-20%. SpiderMonkey (Firefox JS) and V8 use bytecode only as a fallback. Hot code is JIT-compiled into native and dispatch disappears entirely.

Why is direct threading faster than switch dispatch?

Summary

  • **Stack VM** (JVM, CPython, WASM): compact bytecode, easy verification, implicit operands. Simpler to implement.
  • **Register VM** (Lua, Dalvik): explicit registers, fewer instructions, faster thanks to fewer dispatches.
  • **Bytecode design** decides: instruction width, constant pool, opcode typing, control flow (goto vs structured). A trade-off between compactness, verification, and JIT-friendliness.
  • **Dispatch**: switch (simple), direct threading (faster through per-site branch prediction), superinstructions (fuse frequent pairs).

Related topics

Bytecode and VMs are the foundation of managed execution environments:

  • JVM: architecture and bytecode — JVM is the most complete implementation of a stack-based VM with verification and JIT
  • V8: JavaScript — V8 Ignition uses register-based bytecode; Turbofan JITs hot paths
  • Linking and loading — Loading .class and .wasm files is the managed-environment counterpart of dynamic linking

Вопросы для размышления

  • WASM forbids goto and requires structured control flow (block/loop/if). What exactly does that give the verifier, and how does a compiler emit WASM from a language with arbitrary goto (e.g. C via Emscripten)?
  • CPython uses adaptive specialisation from 3.11. What happens at deoptimisation, when an argument type changes after specialisation? How often does that occur in real Python code?
  • Lua VM is single-threaded, the JVM is multi-threaded. How does multi-threading complicate bytecode and interpreter dispatch design? What has to be added to a single-threaded VM?

Связанные уроки

  • arch-04-cpu
Bytecode and Virtual Machines

0

1

Sign In