Game Theory
The Prisoner's Dilemma
1980. Robert Axelrod announces a computer tournament: submit strategies for the iterated prisoner's dilemma. 14 programs from Nobel laureates and economists. The winner: 4 lines of code from a psychologist. Tit-for-Tat: copy whatever the opponent did last round. Nash equilibrium predicts defection. Repetition and reputation change everything.
- **Climate negotiations:** the Paris Agreement is an attempt at mechanism design - changing the payoff matrix through binding commitments and reputational costs for withdrawal
- **GAN training:** generator and discriminator in a continuous prisoner's dilemma - Nash equilibrium equals realistic data; the engineer deliberately constructs this mechanism
- **Antibiotic resistance:** every doctor rationally prescribes antibiotics at the slightest doubt - and collectively superbugs are created
- **Open source ecosystem:** GitHub stars, contribution graphs, npm download counts - reputation mechanisms that shift equilibrium away from free-riding toward contribution
- **Multi-agent RL:** self-play in AlphaGo, AlphaStar, OpenAI Five - iterated prisoner's dilemma as the learning engine
Предварительные знания
The Prisoner's Dilemma
1950. RAND Corporation, Santa Monica. Mathematicians Merrill Flood and Melvin Dresher are building the US nuclear deterrence strategy against the USSR. They stumble onto a two-player game - and the field of game theory hasn't fully escaped it in 70 years. Albert Tucker wraps it in the prisoner story. The structure stays: two rational agents, perfect knowledge of the payoff matrix - and both arrive at an outcome that is worse for each of them.
C = stay silent (Cooperate), D = betray (Defect) Player 2: C Player 2: D Player 1: C (-1, -1) (-3, 0) Player 1: D ( 0, -3) (-2, -2) D dominates C: 0 > -1 and -2 > -3, for any action by the other. Nash Equilibrium: (D, D) = (-2, -2), even though (C, C) = (-1, -1) is better for both.
The structure reproduces across systems. Arms races: each nation gains from unilateral military expansion - and both spend trillions, ending up in the same power balance. Airline price wars: cutting prices first is advantageous until the competitor cuts too - and both bleed margin. Climate negotiations: unilateral emissions cuts are economically punishing - and global temperatures rise. Same matrix. Same equilibrium.
| Profile | Payoff 1 | Payoff 2 | Result |
|---|---|---|---|
| (C, C) | -1 | -1 | Pareto optimum |
| (D, C) | 0 | -3 | Defector wins |
| (C, D) | -3 | 0 | Silent player suffers |
| (D, D) | -2 | -2 | Nash Equilibrium - worse than optimum |
If players can communicate, they'll agree on (C, C) and the problem disappears
Cheap talk doesn't change the equilibrium: they can agree, but each will rationally defect afterward. Binding commitments with real penalties are needed.
A promise to stay silent doesn't change the payoff matrix. After talking, D still dominates C. Only external mechanisms - contracts, reputation, repeated games - can make cooperation stable.
Why is (D, D) a Nash Equilibrium in the prisoner's dilemma?
Axelrod's Tournament and the Iterated Dilemma
1980. Political scientist Robert Axelrod announces a computer tournament: submit strategies for the iterated prisoner's dilemma. 14 programs from Nobel laureates, economists, mathematicians. The longest entry: 77 lines of code. The winner: 4 lines from psychologist Anatol Rapoport. Strategy name: Tit-for-Tat. Algorithm: open with cooperation, then copy the opponent's last move.
**Nice:** opens with cooperation, never defects first. **Retaliatory:** immediately punishes defection. **Forgiving:** returns to cooperation once the opponent does. **Clear:** simple algorithm - the opponent understands the logic and isn't afraid to experiment. Axelrod ran a second tournament with 62 participants who knew the first results. Tit-for-Tat won again.
AlphaGo learns through self-play - an iterated game against itself, with each new agent as the opponent of the previous one in thousands of rounds. GANs (generative adversarial networks) are a continuous prisoner's dilemma: generator and discriminator co-evolve, neither should win too decisively. The brain solves the dilemma biochemically: oxytocin lowers the barrier to cooperation, dopamine encodes the expectation of reciprocity.
**The Folk Theorem:** in an infinitely repeated game, any outcome giving each player at least their minimax payoff can be sustained as an equilibrium - when the discount factor delta is high enough. Plainer: if players are sufficiently patient (delta close to 1), cooperation becomes stable. Backward induction destroys this in finite games with a known horizon.
Cooperation is only possible among altruists or people with strong emotional bonds
Cooperation evolves among fully selfish agents in repeated interactions - through mutual benefit, not altruism.
Axelrod showed that Tit-for-Tat wins tournaments without any good intentions - purely through the mathematics of repeated games. The evolution of human cooperation works by the same principles.
What makes Tit-for-Tat effective in the repeated prisoner's dilemma?
Social Dilemmas and Mechanism Design
The tragedy of the commons (Garrett Hardin, 1968): a village with a shared pasture. Each herder rationally adds one more cow - the personal gain is full, the damage to the pasture is shared. The pasture degrades. This is the prisoner's dilemma with n players. Deep-sea fishing, server overload, air pollution - one pattern.
| Real dilemma | The defection | Nash equilibrium | Exit mechanism |
|---|---|---|---|
| Climate agreements | Don't cut emissions | Global warming | Carbon markets (price signal) |
| OPEC (oil) | Exceed production quota | Low prices | Monitoring + expulsion |
| Open source | Take without contributing | Project degradation | Reputation + licenses |
| Antibiotics (resistance) | Overuse | Superbugs | Regulation + pricing |
Mechanism design is 'inverse game theory': the designer chooses rules so that Nash equilibrium coincides with the desired outcome. Three instruments: 1. **Change the payoff matrix** - fines, subsidies, taxes. A carbon tax makes D more expensive. 2. **Reputation mechanisms** - history is visible to all. eBay ratings, GitHub contribution graphs. 3. **Commitment devices** - binding contracts with external enforcement. International treaties with sanctions.
GAN training is mechanism design in action. Generator and discriminator are two agents in a prisoner's dilemma. The engineer-designer constructs loss functions so that Nash equilibrium coincides with generating realistic data. When the discriminator can no longer distinguish real from generated - Nash is reached. Not an architectural accident - it is built into the mathematics of the mechanism.
A dominant strategy always leads to the best outcome
A dominant strategy is the best for one specific player, but the equilibrium of dominant strategies can be worse for everyone than other outcomes.
The prisoner's dilemma is the clearest example: D dominates for both, but (D,D) = (-2,-2) is worse for both than (C,C) = (-1,-1). Individual optimality does not equal collective optimality.
A strategy strictly dominates another if it:
Key ideas
- **Prisoner's dilemma:** D strictly dominates C; Nash Equilibrium (D,D) is worse than Pareto-optimal (C,C) - local rationality is globally catastrophic
- **Tit-for-Tat** (Axelrod, 1980): 4 lines of code beats Nobel laureates. Nice, retaliatory, forgiving, clear
- **Repeated games** and the shadow of the future make cooperation rationally beneficial when discount factor delta >= 0.5
- **Mechanism design:** change the rules so Nash = Pareto. GAN loss functions, carbon markets, reputation systems - one pattern
Related topics
The prisoner's dilemma is the foundation for understanding more complex games and mechanisms:
- Nash Equilibrium — (D,D) is the classic example of a Nash Equilibrium that is not Pareto-optimal
- Cooperative Games — How to escape the dilemma through coalitions and negotiating over the split of joint payoffs
- Evolutionary Game Theory — How Tit-for-Tat and cooperation evolve in a population of strategies
Вопросы для размышления
- A carbon tax changes the payoff matrix of the climate dilemma. What other mechanisms can shift Nash equilibrium toward the cooperative outcome - without changing the agents' preferences?
- GAN training is unstable in practice: generator and discriminator often diverge instead of converging to Nash. In the language of the prisoner's dilemma - what is happening and why?
- Grim Trigger creates a powerful deterrent but one mistake breaks cooperation permanently. What real-world Grim Trigger strategies exist in international relations - and how do they differ from Tit-for-Tat?
Связанные уроки
- gt-05 — Social dilemmas: multi-agent version of the same structure
- gt-10 — Evolutionary game theory: Tit-for-Tat as evolutionarily stable strategy
- gt-07 — Cooperative games: escaping the dilemma via coalitions
- prob-03-conditional — Conditional probability: Bayesian reputation updates for agents
- gt-02 — Nash equilibrium: foundation for understanding (D,D) as NE