Compilers
Symbol Tables and Scoping: Environments, Namespaces and Resolution
VS Code shows a type error 200ms after the last keystroke across 50 000 lines of TypeScript. Not because of a fast computer - because the TypeScript compiler maintains a symbol table that lets it answer 'what type is this name?' in one hash lookup plus a few parent-pointer hops. The symbol table is the difference between a text editor and an IDE.
- TypeScript Language Server: maintains an incremental symbol table per file - only re-resolves changed scopes on edit, not the whole program
- V8 JavaScript engine: hidden classes cache property name-to-offset mappings - symbol table at runtime for object shapes
- Rust borrow checker: tracks the entire symbol table including lifetime annotations to prove memory safety without GC
- Python CPython: `LOAD_FAST` bytecode for locals vs `LOAD_GLOBAL` for globals - scope chain encoded in bytecode offsets at compile time
Symbol Table: Hash Map of the Universe
TypeScript finds a type error in 50 000 lines in under a second. Not by re-parsing - by walking a pre-built **symbol table**: a hash map from every name in the program to its declaration, type, kind (variable, function, class, type alias), and source location. The symbol table is the compiler's working memory during semantic analysis.
Each entry stores more than a name. A symbol entry records: the name string, its **kind** (let/const/var, function, class, parameter, type), the **type** (inferred or annotated), the **declaring scope**, and flags like `is_exported`, `is_reassigned`, `is_captured_by_closure`. TypeScript uses a `Symbol` object (its own internal type, not `Symbol` the primitive) for each entry.
**Linear probing hash tables** are preferred over chaining for symbol tables because they maximize cache locality during the frequent sequential scans that happen in small scopes (function parameters, loop variables). V8's hidden classes take this further: objects with the same property order share a single descriptor array, making property lookup a direct offset computation instead of a hash lookup.
Why does a symbol table entry need an `isCaptured` flag?
Scope Chain: Lexical vs Dynamic Scoping
Every language makes one fundamental scoping choice. **Lexical scoping** (static scoping): a name resolves to the scope where the function was *defined*. **Dynamic scoping**: a name resolves to the most recently *called* scope on the call stack. JavaScript, Python, Rust, and Haskell are lexically scoped. Bash and Emacs Lisp are dynamically scoped. The difference is invisible until closures appear.
The scope chain is represented as a linked list of `SymbolTable` objects, each with a parent pointer. The compiler creates a new `SymbolTable` on entering a block, function, or module, and restores the previous one on exit. This stack of tables *is* the scope chain at compile time. At runtime it becomes the closure environment - a heap-allocated copy of the lexical chain, keeping captured variables alive.
In a lexically scoped language, when is the scope chain for a closure determined?
Name Resolution and Shadowing
Name resolution is a scope chain walk: start at the innermost scope, search the hash table for the name, move to the parent if not found, repeat until the global scope or an error. The algorithm is $O(d)$ where $d$ is the nesting depth - typically 3-6 levels in real code, making it effectively $O(1)$ in practice.
**Shadowing** is when an inner scope defines a name that already exists in an outer scope. The inner definition wins for all lookups within that scope - the outer symbol is unreachable by name. Rust requires an explicit keyword (`let x = x + 1;` is legal shadowing). TypeScript warns on shadowing in strict mode. Go disallows shadowing of package-level names inside blocks.
**Hoisting** in JavaScript is the compiler pre-registering `var` and `function` declarations before executing the scope. `var x = 1` inside a function means `x` exists (as `undefined`) from the first line of the function, not from the declaration line. `let` and `const` are hoisted to the top of their block but placed in the **Temporal Dead Zone** - accessing them before declaration throws a ReferenceError.
Shadowing is always a bug - the outer variable becomes inaccessible and the code is confusing
Shadowing is a deliberate language feature used in Rust for type refinement and in closures for clean variable rebinding - it is only a bug when unintentional
Rust's idiom `let x = x.parse::<i32>().unwrap()` shadows the string `x` with an integer `x` in one line - this is idiomatic, not a bug. The real risk is accidental shadowing in deeply nested code, which strict-mode warnings in TypeScript and Clippy lints in Rust are designed to catch.
A `let` variable in JavaScript is hoisted but in the Temporal Dead Zone (TDZ). What does this mean in practice?
Key ideas
- Symbol table: hash map from name to entry (kind, type, location, isCaptured, isReassigned) - one per scope
- Scope chain: linked list of symbol tables with parent pointers - lookup walks up until found or error
- Lexical scoping: scope chain fixed at definition time; dynamic scoping: resolved at call time
- Name resolution: $O(d)$ where $d$ is nesting depth - typically 3-6 hops in real code
- Captured variables must be heap-allocated; hoisted `let`/`const` are in TDZ until initialized
Related topics
Symbol tables connect parsing to all downstream semantic analysis phases.
- AST Traversal — The AST visitor pattern drives symbol table population during the declaration pass
- Type Checking — Type checker retrieves type annotations and inferred types from symbol table entries
Вопросы для размышления
- How would a compiler represent the symbol table for a module with re-exports - where a name is defined in one file but exported through another?
- Why do closures in languages with value semantics (like Rust) not always require heap allocation for captured variables?
- Design the symbol table changes needed to support `with` statement semantics (dynamic property injection into scope) - and explain why it breaks lexical analysis.
Связанные уроки
- comp-16-ast — Semantic analysis walks the AST to populate and query the symbol table
- comp-18-type-checking — Type checker looks up types from symbol table entries
- alg-12-bfs — Scope chain traversal is BFS on a parent-pointer environment graph
- ds-01-arrays — Hash table implementation underlies fast symbol lookup
- ds-06-hash-intro