Compilers

AST: Abstract Syntax Tree

Facebook needed to migrate jQuery to React across millions of lines of code. Manual rewrite: 2 years and a broken codebase. AST codemod: 3 days, zero regressions. The same tree that TypeScript uses to highlight errors in 100 ms is what made that migration possible.

TypeScript: type-checks 50 000 lines in under 1 second by caching the AST and incrementally re-traversing changed subtrees
Babel: transforms 3 million npm packages per week using visitor-based plugins - each plugin sees only the node types it declares
ESLint: 300+ built-in rules are all AST visitors; custom rules ship in minutes without touching the core
Prettier: reformats code by printing the AST ignoring original whitespace - structural printing eliminates style debates

AST vs CST: What the Parser Actually Builds

VS Code highlights types across 10 000 files in under 100 ms. TypeScript does not re-parse every token each time - it walks a cached tree of structured nodes. That tree is the AST.

A **Concrete Syntax Tree (CST)** keeps every token - parentheses, commas, keywords. A parser always produces a CST internally. An **Abstract Syntax Tree (AST)** drops syntactic noise: `(2 + 3)` and `2 + 3` become the same `BinaryExpression` node. Compilers, linters, and codemods work on ASTs - the shape is semantically complete without the clutter.

**ESTree** is the de-facto AST standard for JavaScript, originally from SpiderMonkey. Babel, ESLint, Acorn, and Prettier all use ESTree-compatible ASTs. TypeScript extends ESTree with type annotation nodes (`TSTypeAnnotation`, `TSInterfaceDeclaration`, etc.). Tools that consume ESTree can work across parsers with minimal adaptation.

What is the main difference between a CST and an AST?

Node Types: Discriminated Unions of the Grammar

Facebook migrated 20 000 files from `require()` to ES modules overnight without breaking a single test. Not by hand - by a codemod that walked the AST, found `CallExpression` nodes with callee name `require`, and replaced them with `ImportDeclaration`. Accurate transformation requires knowing the exact shape of every node type.

**Source locations.** Every ESTree node carries a `loc` field: `{ start: { line, column }, end: { line, column } }` plus a `range: [start, end]` byte offset pair. ESLint uses `loc` to print error positions. Language servers use `range` to apply text edits. Prettier uses both to preserve comments that fall between nodes.

Why do AST node types use a discriminated union with a `type` field?

Visitor Pattern: Walking the Tree Without Coupling

Babel transforms JSX into `React.createElement` calls for 3 million npm packages per week. The transform does not hardcode traversal logic - it registers a visitor for `JSXElement` nodes and Babel calls it during the walk. This decoupling is the visitor pattern.

**Enter vs Exit order matters.** `enter` fires top-down (parent before children). `exit` fires bottom-up (children before parent). Scope resolution needs `exit` on FunctionDeclaration to know all local variables before processing the body. Minifiers rename variables in `exit` order to avoid renaming before all uses are found.

When should a visitor use `exit` instead of `enter`?

AST Tools: Codemods, Linters, Language Servers

AirBnB migrated 3 000 React class components to hooks in one sprint using `jscodeshift`. Not rewriting manually - writing a 50-line codemod that walked the AST and reordered lifecycle methods. The same AST infrastructure powers every ESLint rule, every Prettier format, every LSP hover tooltip.

**Language Server Protocol (LSP)** decouples editor from language tooling. When hovering a symbol, VS Code sends `textDocument/hover` with cursor position. TypeScript Language Server finds the AST node at that position, resolves its type, and returns a markdown string. Go to definition, find references, rename - all are AST operations hidden behind LSP JSON-RPC.

ASTs are only needed when building a full compiler

Every modern dev tool - ESLint, Prettier, TypeScript, Babel, jscodeshift, LSP - is an AST transformation pipeline. Understanding ASTs means being able to write custom lint rules, codemods, or language plugins without waiting for maintainers.

The barrier to writing an ESLint rule is 30 lines of visitor code. The barrier to a jscodeshift codemod is similar. AST tooling is table stakes for senior JS/TS engineers.

What makes jscodeshift safer for large-scale refactoring than regex-based search-and-replace?

Key ideas

AST removes syntactic noise (parentheses, commas) and keeps semantic structure; CST keeps every token
ESTree is the JS/TS standard; every node has a `type` string enabling discriminated union narrowing
Visitor pattern decouples traversal from transformation: register per-node handlers, framework calls them
Enter fires top-down; exit fires bottom-up - use exit when child data is needed (type inference, scope)
jscodeshift, ESLint rules, Babel plugins, and LSP are all visitor-based AST pipelines

Вопросы для размышления

Why does Prettier reprint the AST from scratch instead of editing the original source text?
How would incremental compilation (like TypeScript's `--watch`) update only the changed AST subtree without full re-parse?
What information is lost when converting a CST to an AST, and when does that matter?

Связанные уроки

comp-15-pratt-parser — Pratt parser produces the AST nodes covered here
comp-14-parser-generators — ANTLR4 and PEG.js output ASTs in the same shape
comp-17-symbol-table — Semantic analysis walks the AST to resolve types and bindings
alg-09 — AST traversal is DFS on a typed tree - same algorithms, richer node types
comp-18-type-checking
comp-11-recursive-descent
plt-26-ast

Compilers

AST: Abstract Syntax Tree

TypeScript: type-checks 50 000 lines in under 1 second by caching the AST and incrementally re-traversing changed subtrees
Babel: transforms 3 million npm packages per week using visitor-based plugins - each plugin sees only the node types it declares
ESLint: 300+ built-in rules are all AST visitors; custom rules ship in minutes without touching the core
Prettier: reformats code by printing the AST ignoring original whitespace - structural printing eliminates style debates

AST vs CST: What the Parser Actually Builds

VS Code highlights types across 10 000 files in under 100 ms. TypeScript does not re-parse every token each time - it walks a cached tree of structured nodes. That tree is the AST.

What is the main difference between a CST and an AST?

Node Types: Discriminated Unions of the Grammar

Why do AST node types use a discriminated union with a `type` field?

Visitor Pattern: Walking the Tree Without Coupling

When should a visitor use `exit` instead of `enter`?

AST Tools: Codemods, Linters, Language Servers

ASTs are only needed when building a full compiler

The barrier to writing an ESLint rule is 30 lines of visitor code. The barrier to a jscodeshift codemod is similar. AST tooling is table stakes for senior JS/TS engineers.

What makes jscodeshift safer for large-scale refactoring than regex-based search-and-replace?

Key ideas

AST removes syntactic noise (parentheses, commas) and keeps semantic structure; CST keeps every token
ESTree is the JS/TS standard; every node has a `type` string enabling discriminated union narrowing
Visitor pattern decouples traversal from transformation: register per-node handlers, framework calls them
Enter fires top-down; exit fires bottom-up - use exit when child data is needed (type inference, scope)
jscodeshift, ESLint rules, Babel plugins, and LSP are all visitor-based AST pipelines

Вопросы для размышления

Why does Prettier reprint the AST from scratch instead of editing the original source text?
How would incremental compilation (like TypeScript's `--watch`) update only the changed AST subtree without full re-parse?
What information is lost when converting a CST to an AST, and when does that matter?

Связанные уроки

comp-15-pratt-parser — Pratt parser produces the AST nodes covered here
comp-14-parser-generators — ANTLR4 and PEG.js output ASTs in the same shape
comp-17-symbol-table — Semantic analysis walks the AST to resolve types and bindings
alg-09 — AST traversal is DFS on a typed tree - same algorithms, richer node types
comp-18-type-checking
comp-11-recursive-descent
plt-26-ast

AST: Abstract Syntax Tree

AST vs CST: What the Parser Actually Builds

Node Types: Discriminated Unions of the Grammar

Visitor Pattern: Walking the Tree Without Coupling

AST Tools: Codemods, Linters, Language Servers

Key ideas

Related topics

Вопросы для размышления

Связанные уроки

AST: Abstract Syntax Tree

AST vs CST: What the Parser Actually Builds

Node Types: Discriminated Unions of the Grammar

Visitor Pattern: Walking the Tree Without Coupling

AST Tools: Codemods, Linters, Language Servers

Key ideas

Related topics

Вопросы для размышления

Связанные уроки