Game Theory

Zero-Sum Games

1928. Von Neumann. Age 25. Minimax theorem in three pages. Seventy-five years later Goodfellow rediscovers the same principle as GAN - and names it 'minimax game'. The Nash equilibrium of GAN = the optimal generator. Von Neumann's theorem guarantees this equilibrium exists. In practice GAN is unstable. Theory and practice diverge - and that gap is itself a field of active research.

**GAN (Goodfellow 2014):** min_G max_D - literally von Neumann's formula. Nash equilibrium = perfect generator. Problem: gradient descent does not guarantee Nash equilibrium - hence mode collapse and training instability
**AlphaGo / AlphaZero:** the game tree is a minimax tree. MCTS is a stochastic approximation of minimax instead of full enumeration of $250^{150}$ Go nodes
**Poker AI (Libratus, CFR):** Counterfactual Regret Minimization - an iterative algorithm converging to Nash equilibrium. Sandholm 2017 - first AI to beat the world's best players at no-limit poker
**Stockfish / chess engines:** Alpha-Beta pruning gives $O(b^{d/2})$ instead of $O(b^d)$. The difference between computable and not

Предварительные знания

The Prisoner's Dilemma

What Zero-Sum Means

1928. John von Neumann. Age 25. Three pages - and strategic reasoning acquires a mathematical foundation. Seventy-five years later Goodfellow rediscovers the same principle as a neural network architecture and calls it GAN.

A zero-sum game is an antagonistic structure: every dollar won by one player is lost by the other. Payoff matrix $A$: rows are player 1's strategies, columns are player 2's. Player 2's payoff is always $-A$. Chess, poker, Matching Pennies - and, not immediately apparent, GAN training.

Game	Type of antagonism	ML analog
Chess	One wins, one loses	AlphaZero, Stockfish (minimax tree)
Poker	Pot redistributed between players	Libratus, Pluribus (CFR)
Matching Pennies	Pure randomization is optimal	GAN without Nash equilibrium
GAN training	G minimizes, D maximizes log-prob	Minimax game (Goodfellow 2014)

Condition	Equilibrium type	Example
Saddle point exists	Pure strategies	Some chess endgames
No saddle point	Mixed strategies	Matching Pennies, Rock-Paper-Scissors
Multiple saddle points	Any is optimal	Interchangeable equilibria

Algorithm	Complexity	Use
Naive minimax	$O(b^d)$	Toy problems
Alpha-Beta pruning	$O(b^{d/2})$	Stockfish, all chess engines
MCTS + neural net	Stochastic	AlphaGo, AlphaZero
CFR	Iterative	Libratus, Pluribus (poker)

GAN variant	Metric	Stability
GAN (Goodfellow 2014)	JS-divergence	Unstable, mode collapse
WGAN (Arjovsky 2017)	Wasserstein-1	Significantly more stable
WGAN-GP	W-1 + gradient penalty	De facto standard

Zero-Sum Games

Предварительные знания

What Zero-Sum Means

Zero-Sum Games

Предварительные знания

What Zero-Sum Means

The Minimax Theorem and Saddle Points

Algorithms: From Alpha-Beta to CFR

GANs as Zero-Sum Games

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки