Compilers
AST: Abstract Syntax Tree
Facebook needed to migrate jQuery to React across millions of lines of code. Manual rewrite: 2 years and a broken codebase. AST codemod: 3 days, zero regressions. The same tree that TypeScript uses to highlight errors in 100 ms is what made that migration possible.
- TypeScript: type-checks 50 000 lines in under 1 second by caching the AST and incrementally re-traversing changed subtrees
- Babel: transforms 3 million npm packages per week using visitor-based plugins - each plugin sees only the node types it declares
- ESLint: 300+ built-in rules are all AST visitors; custom rules ship in minutes without touching the core
- Prettier: reformats code by printing the AST ignoring original whitespace - structural printing eliminates style debates
AST vs CST: What the Parser Actually Builds
VS Code highlights types across 10 000 files in under 100 ms. TypeScript does not re-parse every token each time - it walks a cached tree of structured nodes. That tree is the AST.
A **Concrete Syntax Tree (CST)** keeps every token - parentheses, commas, keywords. A parser always produces a CST internally. An **Abstract Syntax Tree (AST)** drops syntactic noise: `(2 + 3)` and `2 + 3` become the same `BinaryExpression` node. Compilers, linters, and codemods work on ASTs - the shape is semantically complete without the clutter.
**ESTree** is the de-facto AST standard for JavaScript, originally from SpiderMonkey. Babel, ESLint, Acorn, and Prettier all use ESTree-compatible ASTs. TypeScript extends ESTree with type annotation nodes (`TSTypeAnnotation`, `TSInterfaceDeclaration`, etc.). Tools that consume ESTree can work across parsers with minimal adaptation.
What is the main difference between a CST and an AST?
Node Types: Discriminated Unions of the Grammar
Facebook migrated 20 000 files from `require()` to ES modules overnight without breaking a single test. Not by hand - by a codemod that walked the AST, found `CallExpression` nodes with callee name `require`, and replaced them with `ImportDeclaration`. Accurate transformation requires knowing the exact shape of every node type.
**Source locations.** Every ESTree node carries a `loc` field: `{ start: { line, column }, end: { line, column } }` plus a `range: [start, end]` byte offset pair. ESLint uses `loc` to print error positions. Language servers use `range` to apply text edits. Prettier uses both to preserve comments that fall between nodes.
Why do AST node types use a discriminated union with a `type` field?
Visitor Pattern: Walking the Tree Without Coupling
Babel transforms JSX into `React.createElement` calls for 3 million npm packages per week. The transform does not hardcode traversal logic - it registers a visitor for `JSXElement` nodes and Babel calls it during the walk. This decoupling is the visitor pattern.
**Enter vs Exit order matters.** `enter` fires top-down (parent before children). `exit` fires bottom-up (children before parent). Scope resolution needs `exit` on FunctionDeclaration to know all local variables before processing the body. Minifiers rename variables in `exit` order to avoid renaming before all uses are found.
When should a visitor use `exit` instead of `enter`?
AST Tools: Codemods, Linters, Language Servers
AirBnB migrated 3 000 React class components to hooks in one sprint using `jscodeshift`. Not rewriting manually - writing a 50-line codemod that walked the AST and reordered lifecycle methods. The same AST infrastructure powers every ESLint rule, every Prettier format, every LSP hover tooltip.
**Language Server Protocol (LSP)** decouples editor from language tooling. When hovering a symbol, VS Code sends `textDocument/hover` with cursor position. TypeScript Language Server finds the AST node at that position, resolves its type, and returns a markdown string. Go to definition, find references, rename - all are AST operations hidden behind LSP JSON-RPC.
ASTs are only needed when building a full compiler
Every modern dev tool - ESLint, Prettier, TypeScript, Babel, jscodeshift, LSP - is an AST transformation pipeline. Understanding ASTs means being able to write custom lint rules, codemods, or language plugins without waiting for maintainers.
The barrier to writing an ESLint rule is 30 lines of visitor code. The barrier to a jscodeshift codemod is similar. AST tooling is table stakes for senior JS/TS engineers.
What makes jscodeshift safer for large-scale refactoring than regex-based search-and-replace?
Key ideas
- AST removes syntactic noise (parentheses, commas) and keeps semantic structure; CST keeps every token
- ESTree is the JS/TS standard; every node has a `type` string enabling discriminated union narrowing
- Visitor pattern decouples traversal from transformation: register per-node handlers, framework calls them
- Enter fires top-down; exit fires bottom-up - use exit when child data is needed (type inference, scope)
- jscodeshift, ESLint rules, Babel plugins, and LSP are all visitor-based AST pipelines
Related topics
AST is the bridge between parsing and all downstream compiler and tooling stages.
- Pratt Parser — Produces the AST nodes that semantic analysis and codemods consume
- Semantic Analysis — Walks the AST to resolve names, infer types, and check constraints
- Tree Algorithms — DFS, BFS, and tree recursion are the core algorithms behind AST traversal
Вопросы для размышления
- Why does Prettier reprint the AST from scratch instead of editing the original source text?
- How would incremental compilation (like TypeScript's `--watch`) update only the changed AST subtree without full re-parse?
- What information is lost when converting a CST to an AST, and when does that matter?
Связанные уроки
- comp-15-pratt-parser — Pratt parser produces the AST nodes covered here
- comp-14-parser-generators — ANTLR4 and PEG.js output ASTs in the same shape
- comp-17-symbol-table — Semantic analysis walks the AST to resolve types and bindings
- alg-09 — AST traversal is DFS on a typed tree - same algorithms, richer node types
- comp-18-type-checking
- comp-11-recursive-descent
- plt-26-ast