Programming Language Theory

Functional Programming in Depth: Purity, Laziness and Effect Systems

JAX rewrote the rules for ML frameworks in 2018. No explicit computation graphs. No session.run(). Just Python functions - but pure ones. jit, grad, vmap, pmap all work by transforming pure functions: differentiate them, vectorize them, distribute them across TPUs. The entire power of the system rests on one property: referential transparency. Remove it and the whole edifice collapses.

  • **JAX** (Google, powers Gemini training): requires pure functions for jit compilation and autograd. Any impurity breaks XLA tracing and produces silent incorrect gradients.
  • **Apache Spark**: lazy transformation graph (map, filter, join) is a free monad over distributed dataset operations. Actions (collect, save) are the interpreters. The same transformation plan runs locally, on YARN, or on Kubernetes with different handlers.
  • **Scala ZIO** (used at Discord, Tapad, and many fintech firms): encodes all effects - async, error, state, resources - in the type ZIO[R, E, A]. The fiber-based scheduler runs millions of concurrent fibers without OS threads, composable through algebraic laws.

Purity and Referential Transparency: The Contract That Makes Code Replaceable

A pure function: given the same input, always returns the same output, and produces no observable side effects. This is not a style preference. It is a mathematical property - the function is a total map from inputs to outputs. John Hughes (1989) called this the key to compositionality: pure functions can be combined without understanding their internals.

Referential transparency (RT) is the consequence: any expression can be replaced by its value without changing the program's meaning. If `f(x)` is pure, then `f(x) + f(x)` and `let v = f(x) in v + v` are identical. This is memoization as a correctness proof, not an optimization trick. If RT holds, caching is always safe.

JAX (Google's ML framework) is built on purity as a core primitive. jax.jit requires pure functions - it traces the computation graph assuming the same input always produces the same output. Any impurity (random state, Python globals) breaks tracing and produces silent wrong results. Purity is not a limitation - it is what makes XLA compilation and automatic differentiation through arbitrary code possible.

Which property directly justifies that memoization of a function is always correct?

Lazy Evaluation: Computing Only What Is Needed

Call-by-need (lazy evaluation): an expression is not evaluated until its value is required, and when it is evaluated, the result is cached (shared). This is the default in Haskell. The dual: call-by-value (strict evaluation, the default in ML, OCaml, F#, most languages) evaluates arguments before passing them to functions.

The killer application: infinite data structures. A lazy list (stream) can be defined recursively because each element is a thunk - a suspended computation. The natural numbers, the Fibonacci sequence, the prime sieve - all representable as values that only compute what is asked for. This is not a performance trick. It is a structural shift in what can be a first-class value.

Space leaks are the dark side. A lazy accumulator can build a tower of thunks proportional to the list length before computing anything. foldl on a large list does this. The fix is explicit forcing (seq, bang patterns, strict fields). This tension - between the elegance of lazy semantics and the cost of uncontrolled thunks - is why GHC has three evaluation strategies in production code: lazy, strict, and unboxed.

Python generators (yield) implement demand-driven evaluation for sequences. pandas uses lazy evaluation in query plans - filter and select do not execute until an action triggers materialization. Apache Spark's transformations (map, filter, join) are lazy; only actions (collect, count, write) trigger computation. The execution model is call-by-need applied to distributed data frames.

What distinguishes call-by-need from call-by-name?

Effect Systems: Tracking What Functions Actually Do

Monads (Moggi 1989, popularized by Haskell) encode effects in the type system. IO, State, Error, Reader - each monad represents a different effect. The bind operator (>>=) sequences effectful computations. The price: monads do not compose well. Stacking IO on top of State on top of Error requires monad transformers, and monad transformer stacks are famously painful to work with.

Algebraic effects (Plotkin and Power 2003) offer a different contract. An effect is a set of operations (raise, get, put, print). A handler intercepts those operations and decides what to do with them. The same effectful code can be run with different handlers: a production handler that performs real IO, a test handler that records calls, a replay handler for deterministic testing. The computation and its interpretation are fully decoupled.

Scala ZIO and Haskell's tagless final encoding achieve similar separation without native effect support in the runtime. ZIO encodes effects as a type parameter: ZIO[R, E, A] - requires environment R, may fail with E, produces A. The environment R is the set of effects the program needs. At the edge of the program, the runtime provides concrete implementations. ZIO's fiber-based scheduler runs millions of concurrent lightweight threads using algebraic structure to avoid callback hell.

Free monads are the algebraic-effects pattern encoded in any language with monads. A Free monad separates the 'what to do' (the program tree) from the 'how to do it' (the interpreter). Twitter's original Finagle RPC library used Free monads to represent service calls - the same program could be interpreted as a real network call, a mock, or a tracing recorder. OCaml 5 effects (2022) are the first mainstream language runtime to support algebraic effects natively with continuations, making the handler pattern zero-allocation.

Monads are just a design pattern for error handling (like Optional/Maybe)

Monads are a general abstraction for sequencing computations with context. Optional/Maybe is one instance. IO, State, Parser, Continuation, Writer, Reader - each is a monad. The abstraction is the sequencing law (associativity of bind, identity of return), not any specific effect.

The Maybe monad conflates the abstract structure with one concrete use case. The power of monads is that map and flatMap generalize across completely different effects. A parser combinator library and an async effect system share the same compositional laws - making code written against the monad interface work with both.

What is the key advantage of algebraic effects over monads for composing multiple effects?

Related Topics

Functional programming theory connects type theory, semantics, and systems design:

  • Lambda Calculus — Formal foundation of pure functional computation
  • Algebraic Data Types — The data model that pure functional programs compose
  • Linear Types — Complement to effect systems - track resource lifetimes through the type system
  • Practical Functional Programming — Applied FP patterns (map, filter, reduce, immutability) in Python and JavaScript

Итоги

  • **Purity and RT**: pure functions are total maps with no side effects. Referential transparency - replacing any expression with its value preserves meaning - makes memoization, equational reasoning, and free theorems possible.
  • **Call-by-need**: expressions evaluate only when forced, and share the result. Enables infinite data structures and demand-driven computation. Space leaks (unevaluated thunk towers) are the risk; seq and bang patterns are the countermeasure.
  • **Monads**: sequence effectful computations while tracking the effect in the return type. IO, State, Error, Maybe - all follow the same bind/return laws. Composing multiple monads requires transformer stacks, which scale poorly.
  • **Algebraic effects**: separate effect operations from their handlers. The same computation runs under different handlers (real, mock, tracing) without changing the program. OCaml 5 supports this natively. ZIO and free monads achieve the same in languages without native support.

Вопросы для размышления

  • Haskell enforces purity via the IO monad - all impure operations return IO values. Rust enforces resource safety via ownership. Are these two approaches solving the same underlying problem, and if so, which is more expressive?
  • Lazy evaluation makes space complexity harder to reason about. Is there a type system or analysis that can statically bound thunk chains and prevent space leaks at compile time?
  • Effect systems in OCaml 5 use delimited continuations. How do algebraic effects relate to coroutines and async/await - and why did it take mainstream languages until the 2020s to adopt continuation-based concurrency?

Связанные уроки

  • plt-07-algebraic-types — ADTs are the data model that pure functional programs operate on
  • plt-06-lambda-calculus — Lambda calculus is the formal foundation of pure functional computation
  • plt-10-linear-types — Linear types enforce resource usage - complementary to effect systems for tracking side effects
  • prog-18-fp — Practical FP patterns in Python/JS; this lesson covers the theoretical foundations behind them
  • plt-23-csp — CSP manages concurrency through message passing; effect systems manage it through algebraic abstraction
  • ml-01
Functional Programming in Depth: Purity, Laziness and Effect Systems

0

1

Sign In