Compilers

Type Checking and Inference: Hindley-Milner and Bidirectional Typing

Haskell infers the type of every expression in a 50 000-line codebase without a single type annotation in the function bodies. Not by being clever - by applying Robinson's 1965 unification algorithm to a constraint system generated from formal type rules. The same math that proves theorem correctness is what makes TypeScript catch null pointer bugs before runtime.

  • TypeScript: bidirectional checking at 3M+ npm packages per day - parameter annotations make each file independently checkable in O(n)
  • Rust: HM inference within function bodies + borrow checker constraints - the compiler infers lifetimes as well as types using the same unification machinery
  • Kotlin: smart casts use check mode - after `if (x is String)`, the compiler knows `x` has type String in that branch without a cast
  • Scala 3: full Hindley-Milner with higher-kinded types - type inference across implicit conversions and type class resolution

Type Rules: Inference Rules and Attribute Grammars

TypeScript checks 3 million npm packages every day. Each check applies a fixed set of type rules - formal logical assertions about when an expression is well-typed. These rules are not heuristics or pattern matching. They are inference rules in the style of natural deduction, the same formalism used in proof theory since Gentzen's 1935 sequent calculus.

A **type judgment** has the form $\Gamma \vdash e : \tau$ - 'in environment $\Gamma$, expression $e$ has type $\tau$'. The environment $\Gamma$ is the symbol table. Each syntactic form has one rule. For a function call $f(x)$:

These rules are implemented as an **attribute grammar**: each AST node has synthesized attributes (type flowing up from children) and inherited attributes (context flowing down from parent). The type checker computes the `type` attribute for every node bottom-up, applying the corresponding rule. A type error is a rule whose premise cannot be satisfied.

**Structural vs nominal typing**: TypeScript uses structural typing - a type is compatible if it has at least the required properties, regardless of name. Java and C# use nominal typing - compatibility requires explicit inheritance. Structural typing is more flexible for duck-typed patterns but harder to reason about for large codebases. TypeScript's `interface` tag types (`brand: unique symbol`) are a workaround to add nominality.

In the T-App rule, why must the argument type `tau1` match exactly between the function's parameter type and the actual argument?

Hindley-Milner Inference: Unification and Let-Polymorphism

Haskell and OCaml infer the type of every expression without a single type annotation. Rust infers most types within function bodies. The algorithm behind this is **Hindley-Milner (HM)**, independently discovered by J. Roger Hindley in 1969 and Robin Milner in 1978. HM is complete: if a type exists, HM finds it. If it finds one, it is the most general type.

HM works in two phases. **Constraint generation**: traverse the AST and emit type constraints. Every expression introduces a fresh type variable $\alpha_i$. A function application $f(x)$ emits: $\text{type}(f) = \text{type}(x) \to \alpha_{new}$. **Unification**: solve the constraint system using Robinson's unification algorithm. Two types unify by substituting type variables to make them structurally identical.

**Let-polymorphism** is the key insight that makes HM practical. Without it, a function `id` used as both `id(42)` and `id('hello')` would fail - the same type variable cannot be both `number` and `string`. With let-polymorphism, `id` gets a **type scheme** $\forall a. a \to a$: each call site instantiates a fresh copy of $a$. This is the foundation of Haskell's type classes and TypeScript's generics.

**Union-Find data structure** makes unification efficient. Each type variable is a node. Unifying two types merges their sets (union). Looking up the representative (find) collapses the path - the amortized cost per operation is $O(\alpha(n))$ where $\alpha$ is the inverse Ackermann function, effectively constant. HM is $O(n \cdot \alpha(n))$ total - nearly linear in expression size.

Why does HM type inference need let-polymorphism to handle `let id = x => x in [id(42), id('hello')]`?

Bidirectional Type Checking: Check vs Infer Modes

Full HM inference is undecidable for System F (the type system underlying Haskell's polymorphism). Most industrial type checkers - TypeScript, Kotlin, Scala - use **bidirectional type checking**: a pragmatic hybrid that infers where possible and requires annotations where inference would be exponential or ambiguous.

Bidirectional checking has two modes. **Infer mode** ($e \Rightarrow \tau$): synthesize the type of expression $e$ from its subterms - used for variables, literals, and function calls with known argument types. **Check mode** ($e \Leftarrow \tau$): verify that expression $e$ has the *expected* type $\tau$ - used when a type annotation is present or a function argument is passed to a known parameter type.

TypeScript requires annotations at function parameter boundaries because full global inference across call sites would require analyzing all callers before type-checking a function - an $O(n^2)$ problem. Bidirectional checking keeps each function independently checkable in $O(n)$ time. This is why TypeScript's `noImplicitAny` flag exists: it enforces annotations exactly where bidirectional checking cannot infer types on its own.

Type inference means the compiler is guessing - annotated code is more reliable

Type inference is mathematically proven: HM finds the most general type that is consistent with all constraints. The inferred type is as sound as an explicit annotation - often more precise because it reflects the actual use

A developer who writes `function id(x: any): any` has weakened the type. HM would infer `forall a. a -> a` - strictly more informative. The compiler never 'guesses': it either finds a provably correct type or reports an error. Annotations add clarity for humans but do not add correctness.

Why does TypeScript require type annotations on function parameters but not on local variables?

Key ideas

  • Type rules are inference rules: $\Gamma \vdash e : \tau$ - in environment Gamma, expression e has type tau
  • HM inference: generate fresh type variable per expression, emit constraints, unify with union-find in O(n alpha(n))
  • Let-polymorphism: generalize type variables at let-bindings to type schemes (forall a...) - each call site gets a fresh copy
  • Bidirectional checking: infer mode synthesizes types bottom-up; check mode pushes expected types top-down
  • TypeScript requires parameter annotations because full global inference would break per-file modular checking

Related topics

Type checking sits between semantic analysis and code generation in the compiler pipeline.

  • Symbol Tables — Type checker reads declared types and writes inferred types into symbol table entries
  • AST Traversal — Type rules are attribute grammar rules applied during AST traversal

Вопросы для размышления

  • Why does TypeScript's structural typing make generic constraint inference harder than Haskell's nominal type class approach?
  • When would full HM inference produce a type error that bidirectional checking with annotations would accept - give a concrete example.
  • How does the Rust borrow checker extend the HM type system with lifetime variables, and what constraint does `'a: 'b` add to unification?

Связанные уроки

  • comp-17-symbol-table — Type checker looks up declared types from symbol table entries
  • comp-16-ast — Type rules are formulated as attribute grammar over AST nodes
  • alg-12-bfs — Unification graph traversal shares structure with BFS on constraint graphs
  • alg-01-big-o — HM inference is O(n alpha(n)) with union-find - nearly linear in expression size
  • plt-02-type-systems
  • plt-04-type-inference
  • plt-12-subtyping
Type Checking and Inference: Hindley-Milner and Bidirectional Typing

0

1

Sign In