Compilers

Symbol Tables and Scoping: Environments, Namespaces and Resolution

VS Code shows a type error 200ms after the last keystroke across 50 000 lines of TypeScript. Not because of a fast computer - because the TypeScript compiler maintains a symbol table that lets it answer 'what type is this name?' in one hash lookup plus a few parent-pointer hops. The symbol table is the difference between a text editor and an IDE.

TypeScript Language Server: maintains an incremental symbol table per file - only re-resolves changed scopes on edit, not the whole program
V8 JavaScript engine: hidden classes cache property name-to-offset mappings - symbol table at runtime for object shapes
Rust borrow checker: tracks the entire symbol table including lifetime annotations to prove memory safety without GC
Python CPython: `LOAD_FAST` bytecode for locals vs `LOAD_GLOBAL` for globals - scope chain encoded in bytecode offsets at compile time

Symbol Table: Hash Map of the Universe

TypeScript finds a type error in 50 000 lines in under a second. Not by re-parsing - by walking a pre-built **symbol table**: a hash map from every name in the program to its declaration, type, kind (variable, function, class, type alias), and source location. The symbol table is the compiler's working memory during semantic analysis.

Each entry stores more than a name. A symbol entry records: the name string, its **kind** (let/const/var, function, class, parameter, type), the **type** (inferred or annotated), the **declaring scope**, and flags like `is_exported`, `is_reassigned`, `is_captured_by_closure`. TypeScript uses a `Symbol` object (its own internal type, not `Symbol` the primitive) for each entry.

**Linear probing hash tables** are preferred over chaining for symbol tables because they maximize cache locality during the frequent sequential scans that happen in small scopes (function parameters, loop variables). V8's hidden classes take this further: objects with the same property order share a single descriptor array, making property lookup a direct offset computation instead of a hash lookup.

Why does a symbol table entry need an `isCaptured` flag?

Scope Chain: Lexical vs Dynamic Scoping

Every language makes one fundamental scoping choice. **Lexical scoping** (static scoping): a name resolves to the scope where the function was *defined*. **Dynamic scoping**: a name resolves to the most recently *called* scope on the call stack. JavaScript, Python, Rust, and Haskell are lexically scoped. Bash and Emacs Lisp are dynamically scoped. The difference is invisible until closures appear.

The scope chain is represented as a linked list of `SymbolTable` objects, each with a parent pointer. The compiler creates a new `SymbolTable` on entering a block, function, or module, and restores the previous one on exit. This stack of tables *is* the scope chain at compile time. At runtime it becomes the closure environment - a heap-allocated copy of the lexical chain, keeping captured variables alive.

In a lexically scoped language, when is the scope chain for a closure determined?

Name Resolution and Shadowing

Name resolution is a scope chain walk: start at the innermost scope, search the hash table for the name, move to the parent if not found, repeat until the global scope or an error. The algorithm is $O(d)$ where $d$ is the nesting depth - typically 3-6 levels in real code, making it effectively $O(1)$ in practice.

**Shadowing** is when an inner scope defines a name that already exists in an outer scope. The inner definition wins for all lookups within that scope - the outer symbol is unreachable by name. Rust requires an explicit keyword (`let x = x + 1;` is legal shadowing). TypeScript warns on shadowing in strict mode. Go disallows shadowing of package-level names inside blocks.

**Hoisting** in JavaScript is the compiler pre-registering `var` and `function` declarations before executing the scope. `var x = 1` inside a function means `x` exists (as `undefined`) from the first line of the function, not from the declaration line. `let` and `const` are hoisted to the top of their block but placed in the **Temporal Dead Zone** - accessing them before declaration throws a ReferenceError.

Shadowing is always a bug - the outer variable becomes inaccessible and the code is confusing

Shadowing is a deliberate language feature used in Rust for type refinement and in closures for clean variable rebinding - it is only a bug when unintentional

Rust's idiom `let x = x.parse::<i32>().unwrap()` shadows the string `x` with an integer `x` in one line - this is idiomatic, not a bug. The real risk is accidental shadowing in deeply nested code, which strict-mode warnings in TypeScript and Clippy lints in Rust are designed to catch.

A `let` variable in JavaScript is hoisted but in the Temporal Dead Zone (TDZ). What does this mean in practice?

Key ideas

Symbol table: hash map from name to entry (kind, type, location, isCaptured, isReassigned) - one per scope
Scope chain: linked list of symbol tables with parent pointers - lookup walks up until found or error
Lexical scoping: scope chain fixed at definition time; dynamic scoping: resolved at call time
Name resolution: $O(d)$ where $d$ is nesting depth - typically 3-6 hops in real code
Captured variables must be heap-allocated; hoisted `let`/`const` are in TDZ until initialized

Вопросы для размышления

How would a compiler represent the symbol table for a module with re-exports - where a name is defined in one file but exported through another?
Why do closures in languages with value semantics (like Rust) not always require heap allocation for captured variables?
Design the symbol table changes needed to support `with` statement semantics (dynamic property injection into scope) - and explain why it breaks lexical analysis.

Связанные уроки

comp-16-ast — Semantic analysis walks the AST to populate and query the symbol table
comp-18-type-checking — Type checker looks up types from symbol table entries
alg-12-bfs — Scope chain traversal is BFS on a parent-pointer environment graph
ds-01-arrays — Hash table implementation underlies fast symbol lookup
ds-06-hash-intro

Symbol Table: Hash Map of the Universe

Why does a symbol table entry need an `isCaptured` flag?

Scope Chain: Lexical vs Dynamic Scoping

In a lexically scoped language, when is the scope chain for a closure determined?

Name Resolution and Shadowing

Shadowing is always a bug - the outer variable becomes inaccessible and the code is confusing

Shadowing is a deliberate language feature used in Rust for type refinement and in closures for clean variable rebinding - it is only a bug when unintentional

A `let` variable in JavaScript is hoisted but in the Temporal Dead Zone (TDZ). What does this mean in practice?

Key ideas

Symbol table: hash map from name to entry (kind, type, location, isCaptured, isReassigned) - one per scope

Scope chain: linked list of symbol tables with parent pointers - lookup walks up until found or error

Lexical scoping: scope chain fixed at definition time; dynamic scoping: resolved at call time

Name resolution: $O(d)$ where $d$ is nesting depth - typically 3-6 hops in real code

Captured variables must be heap-allocated; hoisted `let`/`const` are in TDZ until initialized

Связанные уроки

comp-16-ast — Semantic analysis walks the AST to populate and query the symbol table

comp-18-type-checking — Type checker looks up types from symbol table entries

alg-12-bfs — Scope chain traversal is BFS on a parent-pointer environment graph

ds-01-arrays — Hash table implementation underlies fast symbol lookup

ds-06-hash-intro

Symbol Tables and Scoping: Environments, Namespaces and Resolution

Symbol Table: Hash Map of the Universe

Scope Chain: Lexical vs Dynamic Scoping

Name Resolution and Shadowing

Key ideas

Related topics

Вопросы для размышления

Связанные уроки

Symbol Tables and Scoping: Environments, Namespaces and Resolution

Symbol Table: Hash Map of the Universe

Scope Chain: Lexical vs Dynamic Scoping

Name Resolution and Shadowing

Key ideas

Related topics

Вопросы для размышления

Связанные уроки