Compilers
Pratt Parser: Elegant Expression Parsing
Vaughan Pratt published this algorithm in 1973. It was nearly forgotten. In 2007 Douglas Crockford rediscovered it and wrote 'Top Down Operator Precedence'. In 2012 TypeScript chose Pratt for expression parsing. In 2015 - Rust. In 2019 - Go 1.13. A good algorithm does not die.
- TypeScript compiler: Pratt parsing for 18 JavaScript expression precedence levels
- rustc: expression parsing via Pratt in parser/expr.rs - ~600 lines instead of thousands of grammar rules
- V8 (JavaScript engine): Pratt-like approach in the parser for fast expression handling
- Prettier (code formatter): Pratt-like approach for understanding precedence when formatting
Operator Precedence: Why It Is Hard
`1 + 2 * 3` must give 7, not 9. Everyone knows this from primary school. But try writing a parser that handles it correctly - and it becomes clear why Vaughan Pratt invented an entire algorithm for this 'obvious' problem in 1973.
The **classic problem**: recursive descent for expressions requires a separate recursion level for each operator precedence level. Five levels = five functions (expr, term, factor, unary, primary). Adding a new precedence level means refactoring the entire grammar. TypeScript supports 18 precedence levels. Eighteen functions?
**Left and right associativity** is a separate problem. `a + b + c` = `(a+b)+c` (left). `a = b = c` = `a=(b=c)` (right). In a classic parser, left associativity uses iteration, right uses a recursive call. In Pratt, this is expressed through different left and right binding power values.
Why does classic recursive descent scale poorly for languages with many precedence levels?
The Pratt Algorithm: Binding Power and nud/led
Pratt's idea is simple and brilliant: each token has a **binding power** - its 'attraction strength' to neighboring tokens. `*` attracts more strongly than `+`. The algorithm: while the next operator has greater binding power - continue building the right operand.
What happens in a Pratt Parser when the next token's binding power is less than or equal to the current min_bp?
Pratt Parser in Real Compilers
The TypeScript compiler, the Rust compiler (rustc), and the Go parser all use Pratt parsing for expressions. The reason: the algorithm is easy to extend with new operators without changing existing code. Adding the `??` (nullish coalescing) operator to TypeScript means adding one line to the binding power table.
How is right-associativity implemented in a Pratt Parser?
Extensions: Postfix, Ternary, Mixfix
The beauty of Pratt parsing is handling any syntactic form through nud/led without changing the core algorithm. Indexing `a[i]`, calling `f(x)`, ternary `a ? b : c`, optional chaining `a?.b` - all are led handlers with the appropriate binding powers.
Pratt Parser is harder than recursive descent and not worth learning
Pratt Parser is simpler than recursive descent for languages with many precedence levels - less code, easier to extend
Recursive descent for JavaScript would require ~18 separate functions for precedence levels. Pratt uses one function with a binding power table. TypeScript, Rust, and V8 chose Pratt precisely for this simplicity.
What makes the ternary operator `a ? b : c` special from a parsing perspective?
Key ideas
- Pratt Parser: each token has a binding power - its attraction strength to neighbors
- nud (null denotation): token handling without a left context (prefix, literals)
- led (left denotation): token handling with a left operand (infix, postfix)
- Right-associativity: pass bp-1 instead of bp during recursion
- Extensibility: new operator = new row in the table, no changes to the core algorithm
Related topics
Pratt Parser builds the AST and is the foundation of expression parsing in modern compilers.
- Recursive Descent — Pratt parsing is an evolution of recursive descent for expression parsing
- AST — Pratt Parser builds AST nodes - the next stage of compilation
Вопросы для размышления
- How can string template literals (`${expr}`) be implemented in a Pratt Parser? What token type and handler are needed?
- What is the difference between Pratt Parser and Dijkstra's Shunting Yard algorithm? When should each be chosen?
- How can the pipe operator `|>` (stage 2 proposal) be added to an existing Pratt Parser for JavaScript?
Связанные уроки
- comp-11-recursive-descent — Recursive descent is the base for understanding Pratt parsing
- comp-10-cfg — Expression grammars provide context for operator precedence
- comp-16-ast — Pratt parser builds the AST - the next lesson
- comp-12-lr-parsing — LR parsing is an alternative approach to the same problem
- alg-21-dp