Predictive Processing and Active Inference

Цели урока

Understand the brain as a hierarchical prediction machine, not a passive receiver
Know the Free Energy Principle: F = complexity - accuracy, two ways to minimize
Explain Active Inference: epistemic value (curiosity) vs pragmatic value (reward)
See the role of Precision in attention and psychopathology
Recognize the parallel between PP and LLMs with tool use

Предварительные знания

POMDPs and partially observable environments (lesson 05)
Hierarchical models (lesson 06)
Basic understanding of Bayesian belief updating

80% of visual pathway fibers go top-down. This is not an evolutionary bug - the brain generates a "film" of the world and checks it against reality, rather than constructing an image from pixels.

Why familiar objects go unnoticed - prediction error is near zero
Hallucinations as predictions without correction (prior precision too high)
Anxiety as hypersensitivity to prediction errors (sensory precision too high)
LLMs and transformers as a literal implementation of predictive processing
Claude Code as an Active Inference agent: predict -> act -> observe -> correct

From Helmholtz to Friston

In the 1860s, Helmholtz called perception "unconscious inference" - the brain interprets rather than photographs. 150 years later, Karl Friston formalized this idea in the Free Energy Principle (2006), unifying neuroscience, statistical physics, and machine learning into a single mathematical framework.

The Brain as a Prediction Machine

**GPT-4 predicts the next token. The brain predicts the next sensory input. This is not a metaphor - Karl Friston showed in 2005 that both mechanisms are described by the same mathematics.** The classical view: the brain reacts to stimuli. The new view: the brain continuously generates hypotheses about the world and updates them only on errors. This flips neuroscience: 80% of visual pathway fibers go top-down, not bottom-up.

Model	What the brain does	Role of sensors
Reactive (classical)	Waits for inputs, then processes	Source of information
Predictive (PP)	Continuously generates forecasts	Source of correction errors
Consequence	Perception is interpretation, not capture	Sensors report only delta

The hierarchy flows in two directions. **Top-down**: higher levels send predictions downward - "I expect to see a face". **Bottom-up**: lower levels send only errors upward - "the nose is slightly different". When the prediction is accurate, the error is zero - no signal at all. This is why familiar objects go unnoticed: prediction error is near zero.

Neuroscience fact: the human visual cortex has 10x more descending connections than ascending ones. The brain generates a "film" and compares it against reality - it does not build an image from pixels.

The brain first sees the world, then builds a model of it

The brain builds a model continuously and perceives only the deviations from it

80% of visual fibers are descending. This is not an architectural curiosity - it shows that top-down predictions are the primary process, while sensory data merely corrects it.

Why does a familiar object receive almost no conscious processing?

Free Energy Principle

**Karl Friston proposed a single principle in 2006 that unifies learning, perception, and action - the Free Energy Principle: all living systems minimize free energy F.** The word "energy" comes from physics, but here it is an information-theoretic quantity - an upper bound on "surprise". Minimizing F means minimizing the gap between expectations and reality.

Key insight: there are **two ways** to reduce F, that is, to reduce the gap between model and reality.

Method	What changes	Example
Perceptual update	Model is fitted to the world	Saw there is no milk - updated the belief
Active action	World is fitted to the model	Went to the store - world now matches prediction
Combination	Partially both	Bayesian weighting by precision

Precision is inverse variance: Precision = 1 / Variance. High precision on the model means the agent trusts its predictions and will act to make the world match them. High precision on sensors means the agent trusts observations and will update the model.

The Free Energy Principle is thermodynamics applied to the brain

The term is borrowed but refers to an information-theoretic quantity - KL divergence between beliefs and reality

Friston deliberately used physics terminology to connect with the principle of minimal energy. In practice F = complexity - accuracy, where both terms are information quantities, not joules.

According to the Free Energy Principle, what happens when prior precision is very high (the model is very confident)?

Active Inference and Precision

**Active Inference is when an agent does not passively update its model but actively changes the world to match its predictions.** Action becomes a self-fulfilling prophecy. Expected Free Energy (G) determines which action to choose: it balances curiosity (epistemic value - learn something new) and reward (pragmatic value - achieve the goal).

Component of G	Question	Agent behavior
Epistemic value	What can be learned?	Exploration under high uncertainty
Pragmatic value	Is the goal being reached?	Exploitation with a known model
Balance	Exploration vs exploitation?	Automatic based on uncertainty level

**Precision Weighting** is the mechanism for controlling attention. Precision = 1/Variance: high precision on a signal means "trust this", low means "ignore". The brain dynamically adjusts precision at each level of the hierarchy.

Precision imbalance	Result	Clinical
Too high sensory precision	Every input is alarming	Anxiety, hypervigilance
Too high prior precision	Model overrides reality	Delusions, hallucinations
Unstable precision	Context processing difficulties	Autism spectrum conditions

Attention in transformers (query-key-value) is functionally analogous to Precision Weighting: both mechanisms dynamically weight which signals matter for the current computation. This is not coincidental - Friston actively investigates this parallel.

An agent enters an unfamiliar environment with high uncertainty. What does Active Inference predict?

Predictive Processing and LLMs

**GPT-4 is trained by predicting the next token. Cross-entropy loss is the prediction error. The transformer minimizes "surprise" on a text corpus - exactly as the brain does under Friston's framework.** This is not a metaphor: the mathematics is literally the same. The difference is that LLMs predict tokens while the brain predicts world states; LLMs without tool use cannot change the world, but the brain can.

Claude Code is an example of an Active Inference LLM: it generates a prediction of the needed code, compares it with the goal, calls tools (bash, edit), observes the result, and corrects its approach. The cycle continues until the prediction error reaches zero (task solved).

Brain (PP) — Predicts world states. Hierarchy of timescales (ms to years). Active inference through muscles. Precision via dopamine/noradrenaline.
LLM (Transformer) — Predicts tokens. Single timescale (forward pass). Active inference through tool use. Precision via attention weights.

Connections to Other Topics

Predictive Processing unifies several concepts from this course.

Global Workspace (lesson 11) — PP explains how beliefs are updated; GWT explains what is consciously broadcast. Large prediction errors win the competition for the workspace.
Self-Models (lesson 9) — The self-model is a predictive model of self. Interoception is prediction of bodily states. Interoception errors are emotions.
POMDP (lesson 5) — PP generalizes Bayesian inference from POMDP across the full hierarchy of perception and action.

LLMs are just statistical machines over word frequencies, unrelated to the brain

LLMs implement predictive processing - the same mathematics that Friston formalized for the brain in 2006

Cross-entropy loss = surprise minimization = Free Energy in the information-theoretic sense. This is not a metaphor - it is mathematical equivalence. The difference lies in the substrate and in the presence of active inference through action.

How does an LLM with tool use fundamentally differ from one without it, from an Active Inference perspective?

Вопросы для размышления

If only prediction errors (surprises) reach consciousness - what does this imply about the nature of routine and habit? How would you change a habit through the lens of PP?

Связанные уроки

prob-04-bayes