Automata and Cognition

Society of Minds

Цели урока

Understand Theory of Mind levels from reactive (0) to Common Knowledge
Know how Inverse RL infers agent goals from observed behavior
Understand Nash and Stackelberg equilibria and their practical applications
See how signaling games model the emergence of language
Know CTDE as the solution to non-stationarity in multi-agent learning

Предварительные знания

Self-Models and Introspection (aut-09-self-models)
MDP and Decision Making (aut-04-mdp)
Bayesian Inference and HMM (aut-03-hmm)

A chess player thinks: "He thinks I'll move the queen, but he doesn't know that I know he thinks this". In 2016 Libratus implemented exactly this - and won USD 1.7 million from professionals.

**AlphaStar (2019)**: Grandmaster in StarCraft II using CTDE - dozens of units coordinate through a single trained policy
**Libratus (2016)**: poker bot uses Level 2 ToM - models the opponent's model of itself, continuously rebuilds strategy
**Autonomous vehicles**: Waymo systems predict pedestrian and driver intentions through Inverse RL on historical trajectories
**YouTube recommendations**: IRL on 2 billion users - behavior decoded into reward functions for personalization
**OpenAI Multi-Agent Particles (2016)**: agents developed their own coordination language without external definition - pure signaling games

From the Turing Test to Theory of Mind

The Turing Test (1950) was testing Theory of Mind: can a machine simulate the beliefs and intentions of a human? The term Theory of Mind was introduced by Premack and Woodruff in 1978 while studying chimpanzees. The key question: does a chimpanzee understand that a human has goals different from its own? The classic test - the Sally-Anne task (1985): a child under 4 does not understand that another person can have a false belief. After age 4, they do. This is the critical milestone of ToM development in humans.

Theory of Mind: levels of recursion

**In 2016, the poker bot Libratus defeated four professional players in Heads-Up No-Limit Texas Hold'em, winning USD 1.7 million in chips. Libratus didn't just count cards - it modeled what opponents thought about its strategy, and systematically exploited their models.** Theory of Mind - the ability to understand that other agents have their own beliefs, desires, and intentions - is the foundation of social intelligence.

**Theory of Mind (ToM)** - the ability to attribute mental states to other agents: beliefs, desires, intentions. The term was introduced by Premack and Woodruff in 1978 while studying chimpanzees. The question: does a chimpanzee understand that a human has goals different from its own?

Level	Description	AI Example
0 - Reactive	Other agents = environment objects	Simple bot: sees enemy - shoots
1 - Others' beliefs	"He thinks X"	Poker bot: he thinks I'm bluffing
2 - Model of me	"He thinks that I think Y"	Libratus: opponent thinks I'm aggressive
3+	"He thinks that I think that he..."	Negotiations, diplomacy
Common Knowledge	Everyone knows that everyone knows...	Traffic lights, money, language

ToM levels: from reactive to recursive

**Level 2 is bluffing in poker.** A player makes a large bet not because of good cards, but to make the opponent think the cards are strong. This is managing someone else's model of you - the key operation of social intelligence. **Common Knowledge** is the limit of this recursion: "Everyone knows that everyone knows that everyone knows". This is why traffic lights, money, and language work.

The Blue-Eyed Islanders puzzle

On an island, 100 people have blue eyes. No one speaks about eye color. Rule: if you learn your own eye color - leave at midnight. A tourist says: "I see a person with blue eyes". Everyone already knew this! But Common Knowledge changed: now everyone knows that everyone knows there is a blue-eyed person. After 100 nights, everyone leaves.

Theory of Mind is just empathy or reading emotions

ToM is the formal modeling of beliefs, desires, and intentions of other agents

Empathy is an affective response. ToM is a cognitive operation: building a model of another agent's mental state and using it to predict behavior. This is why ToM can be formalized mathematically and implemented in AI.

A poker player bluffs - makes a large bet with bad cards. What level of Theory of Mind is involved?

Modeling other agents: Inverse RL and game theory

**How do you build a model of another agent?** Observed actions are a projection of a hidden reward function. **Inverse Reinforcement Learning (IRL)** reverses the problem: it infers goals from behavior. This is how YouTube's recommendation system works - 2 billion users whose actions are continuously decoded into preferences.

**Inverse RL**: observe behavioral trajectories of an agent → infer the reward function being maximized. Assumption: the agent is approximately optimal with respect to its hidden goal. Applications: imitation learning, human preference modeling, autonomous driving.

Nash and Stackelberg: game theory formalism

**Nash Equilibrium** - a set of strategies where no agent can improve their outcome by unilateral deviation. The classic Prisoner's Dilemma shows the paradox: individually rational behavior leads to a collectively suboptimal outcome. Both defect (1,1) is Nash equilibrium, even though both cooperating (3,3) is better for everyone.

Game type	Structure	Nash equilibrium
Prisoner's Dilemma	2 players, cooperate or defect	Both defect - suboptimal
Coordination game	Payoff only if choices match	Multiple equilibria - selection problem
Zero-sum game	One's gain = other's loss	Minimax - unique equilibrium
Stackelberg	Leader moves first, follower responds	Leader has commitment advantage

Inverse RL observes that an agent always takes the route through the park, even when it's longer. What is the correct conclusion?

Communication: from signals to pragmatics

**Language emerged evolutionarily as a coordination mechanism.** In 2016, OpenAI ran an experiment: agents in an environment had to coordinate actions. Without any instructions, they developed their own "language" - a signal system that both agents interpret the same way. Signaling games formalize this process.

**Signaling game (Lewis 1969)**: Sender knows the state of the world, Receiver must act. Sender sends a signal. Reward is shared - both benefit from correct interpretation. Through repeated interaction, a convention emerges: shared meaning without external definition.

Pragmatics: speaker simulates the listener

People don't speak literally: "Can you pass the salt?" is a request, not a question about capabilities. **Rational Speech Act (RSA)** models this mathematically: the speaker chooses an utterance not for its truth, but for how the listener will interpret it. This requires ToM level 1.

Schelling Points: coordination without communication

Thomas Schelling (Nobel Prize 2005) showed: if two people are asked to meet in New York with no specified location, most choose Grand Central Station at noon. No one agreed on this. It is a Schelling Point - a salient focal point that agents choose through mutual modeling: "What would he choose, knowing I'm choosing the same thing?"

Agents must agree on a language in advance

Language as a coordination mechanism emerges evolutionarily through repeated interaction

Lewis (1969) formally showed: signaling games with reinforcement learning converge to stable conventions without external definition of meaning. OpenAI Multi-Agent Particles (2016) reproduced this empirically. Language is not a contract - it is the Nash equilibrium of a signaling game.

Why does pragmatic communication require Theory of Mind?

Multi-agent learning: CTDE

**AlphaStar (DeepMind, 2019) reached Grandmaster in StarCraft II, playing against humans in real time with multiple units simultaneously.** This is a multi-agent problem: dozens of units act in parallel, each seeing only its own surroundings. The naive approach - running independent Q-learning - breaks immediately.

**The Independent Learners problem**: each agent learns as if the environment is static. But the environment changes because other agents are also learning. Each sees a "moving target" - non-stationarity makes Q-learning non-convergent in the general case.

Algorithm	Approach	Use case
Independent Q-learning	Each agent learns separately	Simple tasks, non-stationarity issues
MADDPG	CTDE with deterministic policy	Continuous actions, mixed cooperative
QMIX	CTDE with monotonic Q-function mixing	Cooperative, reward decomposition
MAPPO	CTDE with proximal policy optimization	Complex cooperative tasks, AlphaStar-level

Why is Independent Q-learning unstable in multi-agent environments?

Вопросы для размышления

When is it beneficial for an agent to deliberately limit the depth of its Theory of Mind recursion - for example, to act like a Level 0 agent? How does this relate to Nash equilibrium in repeated games?

Связанные уроки

dist-03-fallacies