Game Theory

Game Theory in ML

When Goodfellow wrote down GANs in 2014, he applied an 80-year-old idea - von Neumann's minimax theorem - to neural networks. When OpenAI trained Five for Dota, they used self-play, a concept rooted in Nash's theorem. Game theory isn't a separate field from ML; it's the mathematical foundation for problems with multiple agents and adversarial dynamics.

**GANs and modern image generation:** GANs (StyleGAN, BigGAN) dominated image synthesis until diffusion models (DDPM, Stable Diffusion) took over. Understanding minimax instability is part of why the field moved away from GANs
**Adversarial robustness:** banks, medical AI, and security systems all need to be audited against adversarial attacks. FGSM and PGD are standard audit tools
**OpenAI Five / AlphaStar:** multi-agent learning via self-play - still an open problem in MARL. These systems showed it scales, but the theoretical convergence questions remain

Предварительные знания

Algorithmic Game Theory

GANs as Minimax Games

Goodfellow's 2014 proposal for generating images was, at its core, a game. Generator G tries to produce "fake" data indistinguishable from real data. Discriminator D tries to tell real from generated. They're adversaries: G wants to fool D, and D doesn't want to be fooled. It's a classic zero-sum game in a continuous strategy space.

min_G max_D V(G, D) = E_{x~p_data}[log D(x)] + E_{z~p_z}[log(1 - D(G(z)))] D(x) → 1 for real data, D(G(z)) → 0 for fake. G minimizes: wants D(G(z)) → 1 (fool D). Nash Equilibrium: p_G = p_data (generator reproduces the real distribution), D(x) = 1/2 everywhere.

The theoretical GAN equilibrium is precise: G reproduces the real distribution, D can't distinguish them. In practice, GAN training is unstable: mode collapse (G generates only one mode), vanishing gradients, cyclic oscillations. The root cause: simultaneous gradient descent in a game does not converge to NE by standard methods.

GAN-one of the most influential ideas in ML

Ian Goodfellow came up with GANs after a discussion at a Montreal pub in 2014, and the first prototype was coded that same night. The generator-discriminator game became the foundation for StyleGAN, BigGAN, and a wave of generative image models that preceded the diffusion era. GAN instability is still an active research area, tackled by Wasserstein GAN, spectral normalization, and other tools borrowed from game theory.

GAN training converges to Nash Equilibrium via standard gradient descent

Standard simultaneous gradient descent in zero-sum games doesn't converge to NE - it cycles. GANs need special methods (Wasserstein loss, spectral norm, two-timescale updates).

Simultaneous gradient descent in a continuous game behaves like replicator dynamics with oscillations. Convergence to NE requires extragradient methods or other stabilizers.

The Nash Equilibrium of a GAN is the state where:

Adversarial Machine Learning

An attacker wants to fool a spam filter. The attack model: the attacker is player 1 (adversary), the filter is player 2 (defender). The attacker minimizes detection probability; the filter maximizes it. It's another minimax zero-sum game, but now the stakes are real: credential theft, content moderation bypass, face recognition evasion. Adversarial ML is applied zero-sum game theory.

Adversarial example: x' = x + ε · sign(∇_x L(θ, x, y)) FGSM (Fast Gradient Sign Method, Goodfellow et al. 2015): add a small perturbation along the loss gradient, and the model classifies a bus as a panda. Attack and defense is an iterative game. Every defense creates a new attack surface. There is no perfect defense, only a trade-off between accuracy and robustness.

Attack Type	Method	Example Use
Evasion (white-box)	FGSM, PGD, C&W	Bypassing malware detector
Evasion (black-box)	Transfer attacks, query-based	Bypassing classifier API
Data poisoning	Backdoor in training set	Poisoning a safety model
Model extraction	Oracle queries	Stealing a model via API

Adversarial ML is an arms race: new attacks, new defenses. Game theory predicts the outcome: in zero-sum games, there's no "perfect" defense without giving up accuracy on clean data. This has been formally proved: under mild conditions on the distribution, any classifier has adversarial examples inside an arbitrarily small ε-ball.

Adversarial training completely solves the adversarial examples problem

Adversarial training improves robustness against known attacks but not against new ones. It's an arms race: every defense leads to new attacks.

Formally proved (Tsipras et al., 2019): there is a fundamental trade-off between accuracy on clean data and robustness to adversarial attacks. No free lunch.

FGSM (Fast Gradient Sign Method) creates an adversarial example by:

Multi-Agent Reinforcement Learning

OpenAI Five beat world champions in Dota 2. AlphaStar reached grandmaster level in StarCraft II. AlphaZero dominated chess, Go, and shogi. All of these are Multi-Agent RL systems: multiple RL agents learning by interacting with each other. The core problem: each agent's environment is non-stationary - it depends on the other agents' policies, which are also changing. Standard single-agent RL theory doesn't apply; game theory is required.

An extension of MDPs to multiple agents: • S-states, Aᵢ-actions of agent i • T: S × A₁ × ... × Aₙ → Δ(S)-transitions • Rᵢ: S × A₁ × ... × Aₙ → R-rewards Each agent i maximizes: Σ_t γᵗ rᵢ(t) NE in a Markov game = Nash policy profile: policy π*ᵢ is a best response to π*₋ᵢ in every state.

Self-play is the key method for two-player zero-sum games. Cooperative tasks (multi-robot, team games) require different approaches: Centralized Training Decentralized Execution (CTDE), communication protocols, credit assignment. In cooperative games, agents must learn not just optimal policies but also how to coordinate.

Self-play always works for any game

Self-play works well for zero-sum games (chess, Go, StarCraft). In cooperative or mixed games it can converge to suboptimal equilibria or fail to converge at all.

In zero-sum games, self-play is guaranteed to do no worse than any fixed strategy (minimax). In cooperative games there's no built-in coordination mechanism, so extra tools are required: communication, a centralized critic.

Why is Multi-Agent RL harder than Single-Agent RL?

Minimax Optimization in ML

GANs aren't the only place ML meets minimax. Robust optimization (guard against worst case), distributionally robust ML (guard against distribution shift), meta-learning (inner loop vs outer loop) - all are minimax problems. Game theory reveals the common structure behind methods that look very different on the surface.

min_θ max_φ L(θ, φ) Examples: • GAN: θ = G parameters, φ = D parameters • Adversarial training: θ = model, φ = attack • Distributionally robust: θ = model, φ = worst-case distribution • Bi-level optimization: θ = meta-params, φ = task-specific params Computational problem: simultaneous GD does not converge to saddle points.

Method	Converges to NE	Application
Simultaneous GD	No (cycles)	Naive GAN
Extragradient	Yes (monotone games)	GAN stabilization
Optimistic GD	Yes (bimatrix games)	WGAN, spectral norm
PSRO	Yes (self-play generalization)	OpenAI Five, AlphaStar

Looking at minimax optimization through the lens of game theory builds intuition: why GANs are unstable (simultaneous GD doesn't converge), why Wasserstein GAN is more stable (the Lipschitz constraint regularizes the "game"), why spectral normalization helps. These aren't just hyperparameter tricks; they change the geometry of the game.

GAN instability comes from architecture errors or bad hyperparameters

GAN instability is a fundamental property of simultaneous gradient descent in zero-sum games. Even with a perfect architecture, simultaneous GD cycles around NE.

In the quadratic minimax game (x·y), simultaneous GD traces an elliptical orbit without converging. Extragradient and Optimistic GD have provable convergence for monotone operators.

Why does the Extragradient method stabilize GAN training better than plain Gradient Descent?

Key Ideas

**GAN = minimax game:** G minimizes, D maximizes. NE: p_G = p_data. Simultaneous GD doesn't converge to NE
**Adversarial ML:** attack/defense arms race - zero-sum game with a fundamental accuracy/robustness trade-off
**MARL:** non-stationary environment; self-play → NE in zero-sum; CTDE for cooperative tasks
**Extragradient:** stabilizes minimax training via lookahead. Wasserstein GAN and spectral norm act as game regularization

Вопросы для размышления

GAN training is unstable due to game-theoretic properties. Stable Diffusion sidesteps this by not using GANs. In what sense is diffusion "avoiding the game"? What does it gain and lose?
Adversarial robustness and accuracy are in tension. How does this relate to Price of Anarchy? What's the "social optimum" here, and who is the "selfish agent"?
MARL is used in robotics, autonomous vehicles, and financial trading. Each case is a different game (cooperative, competitive, mixed). Which training approach fits each?

Связанные уроки

prob-04-bayes