Game Theory

Zero-Sum Games

1928. Von Neumann. Age 25. Minimax theorem in three pages. Seventy-five years later Goodfellow rediscovers the same principle as GAN - and names it 'minimax game'. The Nash equilibrium of GAN = the optimal generator. Von Neumann's theorem guarantees this equilibrium exists. In practice GAN is unstable. Theory and practice diverge - and that gap is itself a field of active research.

  • **GAN (Goodfellow 2014):** min_G max_D - literally von Neumann's formula. Nash equilibrium = perfect generator. Problem: gradient descent does not guarantee Nash equilibrium - hence mode collapse and training instability
  • **AlphaGo / AlphaZero:** the game tree is a minimax tree. MCTS is a stochastic approximation of minimax instead of full enumeration of $250^{150}$ Go nodes
  • **Poker AI (Libratus, CFR):** Counterfactual Regret Minimization - an iterative algorithm converging to Nash equilibrium. Sandholm 2017 - first AI to beat the world's best players at no-limit poker
  • **Stockfish / chess engines:** Alpha-Beta pruning gives $O(b^{d/2})$ instead of $O(b^d)$. The difference between computable and not

Предварительные знания

  • The Prisoner's Dilemma

What Zero-Sum Means

1928. John von Neumann. Age 25. Three pages - and strategic reasoning acquires a mathematical foundation. Seventy-five years later Goodfellow rediscovers the same principle as a neural network architecture and calls it GAN.

A zero-sum game is an antagonistic structure: every dollar won by one player is lost by the other. Payoff matrix $A$: rows are player 1's strategies, columns are player 2's. Player 2's payoff is always $-A$. Chess, poker, Matching Pennies - and, not immediately apparent, GAN training.

0

1

Sign In

GameType of antagonismML analog
ChessOne wins, one losesAlphaZero, Stockfish (minimax tree)
PokerPot redistributed between playersLibratus, Pluribus (CFR)
Matching PenniesPure randomization is optimalGAN without Nash equilibrium
GAN trainingG minimizes, D maximizes log-probMinimax game (Goodfellow 2014)

The paradox: in a zero-sum game it seems as though the best strategy is unpredictability. But von Neumann's theorem proves otherwise: there exists a unique optimal mixed strategy. Randomness is not desperation - it is the mathematically optimal choice.

GAN converges to the optimum because von Neumann's theorem guarantees Nash equilibrium

The theorem guarantees existence of equilibrium, not convergence of gradient descent to it. That is exactly why GAN training is unstable.

Gradient descent in a minimax game need not converge to Nash equilibrium. Existence does not imply algorithmic reachability in reasonable time. WGAN partially addresses this by changing the metric.

In GAN training the generator G minimizes a loss and the discriminator D maximizes it. This is an example of:

The Minimax Theorem and Saddle Points

The key question: what happens when each player assumes the opponent knows their strategy and plays optimally against it? Player 1 maximizes the minimum guaranteed payoff. Player 2 minimizes the maximum possible loss. Von Neumann showed: with mixed strategies these two values coincide.

**Minimax theorem (von Neumann, 1928):** for any matrix $A \in \mathbb{R}^{m \times n}$: $$\max_{p \in \Delta_m} \min_{q \in \Delta_n} p^\top A q = \min_{q \in \Delta_n} \max_{p \in \Delta_m} p^\top A q = v$$ where $\Delta_m, \Delta_n$ are probability simplices. The number $v$ is called the **value of the game**. Proof uses Brouwer's fixed-point theorem or LP duality.

A **saddle point** is the geometric image of equilibrium. Element $a_{ij}$ of the matrix is a saddle point when it is the minimum in row $i$ and the maximum in column $j$ - like a mountain pass: minimum along one direction, maximum along the other. When a saddle point exists, equilibrium is achieved in pure strategies.

ConditionEquilibrium typeExample
Saddle point existsPure strategiesSome chess endgames
No saddle pointMixed strategiesMatching Pennies, Rock-Paper-Scissors
Multiple saddle pointsAny is optimalInterchangeable equilibria

The minimax strategy is for cowards afraid to take risks

Minimax is the mathematically optimal strategy against a rational opponent. Any deviation from it is exploitable.

In a zero-sum game against a rational opponent, any deviation from minimax gives the opponent a guaranteed improvement. Minimax is the only non-exploitable strategy.

What does von Neumann's minimax theorem guarantee?

Algorithms: From Alpha-Beta to CFR

Computing minimax naively requires a full traversal of the game tree: $O(b^d)$, where $b$ is branching factor and $d$ is depth. Chess: $b \approx 35, d \approx 80$ - the universe would not last long enough. Alpha-Beta pruning cuts this to $O(b^{d/2})$ through one observation: a branch that is provably worse than one already found can be skipped entirely.

**Alpha-Beta pruning:** two thresholds are maintained - $\alpha$ (best for MAX) and $\beta$ (best for MIN). A branch is pruned when $\alpha \geq \beta$. With optimal move ordering: $O(b^{d/2})$ instead of $O(b^d)$. Used in Stockfish, Komodo, Fritz. This is why a 1990s laptop could beat grandmasters.

Poker is a different beast. Incomplete information: no one knows the opponent's cards. CFR (Counterfactual Regret Minimization, Sandholm 2017) is an iterative algorithm that converges to Nash equilibrium by accumulating regrets. Libratus beat the world's best poker players in 2017. CFR is essentially gradient descent on the game manifold.

AlgorithmComplexityUse
Naive minimax$O(b^d)$Toy problems
Alpha-Beta pruning$O(b^{d/2})$Stockfish, all chess engines
MCTS + neural netStochasticAlphaGo, AlphaZero
CFRIterativeLibratus, Pluribus (poker)

AlphaGo uses pure minimax - just with a better evaluation function

AlphaGo uses MCTS - a stochastic approximation of minimax with a neural network for position evaluation. Exact minimax for Go is physically incomputable.

Go: $b \approx 250, d \approx 150$. $250^{150}$ nodes - a number incomparably larger than the number of atoms in the observable universe. MCTS samples trajectories and evaluates them with a neural network instead of exact enumeration.

Alpha-Beta pruning gives $O(b^{d/2})$ instead of $O(b^d)$. What does this mean for chess ($b=35, d=80$)?

GANs as Zero-Sum Games

Goodfellow 2014 described GANs through von Neumann's formula directly. Generator $G$ minimizes, discriminator $D$ maximizes:

**GAN as minimax game (Goodfellow et al., 2014):** $$\min_G \max_D \; \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]$$ Nash equilibrium of this game: $D(x) = 1/2$ everywhere, $G$ reproduces $p_{\text{data}}$. Von Neumann's theorem guarantees this equilibrium exists. In practice: GANs often fail to converge - gradient descent does not guarantee Nash equilibrium.

GAN instability is a direct consequence of the theoretical gap. Gradient descent on a minimax objective can oscillate rather than converge. WGAN (Arjovsky 2017) changes the game itself: instead of JS-divergence it uses the Wasserstein distance - Lipschitz-constrained discriminators provide more stable dynamics.

GAN variantMetricStability
GAN (Goodfellow 2014)JS-divergenceUnstable, mode collapse
WGAN (Arjovsky 2017)Wasserstein-1Significantly more stable
WGAN-GPW-1 + gradient penaltyDe facto standard

WGAN fixes GAN instability by using a better network architecture

WGAN changes the game itself: JS-divergence is replaced by Wasserstein distance, which provides better gradients and more stable dynamics.

JS-divergence can be constant (provide no gradients) when distributions do not overlap. W-distance always gives a non-zero signal. This is a change in the game-theoretic formulation, not in the architecture.

The Nash equilibrium of a GAN is the state where $D(x) = 1/2$ everywhere. Why does training rarely reach it?

Key Ideas

  • **Zero-sum:** payoffs are antagonistic - chess, poker, GAN. Matrix $A$ and $-A$ simultaneously
  • **Von Neumann's theorem (1928):** $\max_p \min_q p^\top Aq = \min_q \max_p p^\top Aq = v$ - every finite game has a value and optimal mixed strategies
  • **Randomness is optimal:** Matching Pennies, Rock-Paper-Scissors - equilibrium only in mixed strategies. 50/50 randomization is mathematically necessary
  • **Alpha-Beta:** $O(b^{d/2})$ instead of $O(b^d)$ - the difference between a laptop and the heat death of the universe for chess
  • **GAN = minimax game:** theorem guarantees Nash equilibrium exists. Gradient descent does not guarantee reaching it - hence instability

Related Topics

Zero-sum games are the foundation of game theory, opening the way to richer structures:

  • Nash Equilibrium — Nash generalized von Neumann's minimax to non-zero-sum games
  • Game Theory in ML — GANs, adversarial examples, multi-agent RL - direct applications of zero-sum minimax
  • Algorithmic Game Theory — Price of Anarchy - how much society loses in non-zero-sum equilibria

Вопросы для размышления

  • GANs are unstable despite von Neumann's theorem. What does this say about the gap between existence of equilibrium and its algorithmic reachability?
  • Poker is zero-sum with incomplete information. Why is CFR harder than Alpha-Beta, and what does 'counterfactual regret' mean from a minimax perspective?
  • AlphaGo uses MCTS instead of exact minimax. What guarantee of von Neumann's theorem is sacrificed - and why is that acceptable in practice?

Связанные уроки

  • gt-02 — Nash equilibrium generalizes minimax to non-zero-sum games
  • gt-12 — GANs are minimax games directly: G minimizes, D maximizes
  • gt-11 — Algorithmic game theory builds on minimax and LP equivalence
  • mt-05 — Von Neumann's proof uses fixed-point arguments from measure theory
  • la-05-matrices-intro
Zero-Sum Games