Blockchain

The Consensus Problem: Why and How

Three people are deciding where to have dinner. One of them always tells different people different things. It seems like an innocent problem, but in 1982 Leslie Lamport proved that with three participants and one liar, agreement is mathematically impossible. It's not a matter of cleverness or algorithm - it's a mathematical law. That same law determines why a Bitcoin transaction takes 60 minutes to confirm, and why Cosmos Hub once fully halted for half an hour.

**Bitcoin** processes $50B+ in transfers monthly without a single arbiter - Nakamoto consensus lets thousands of anonymous nodes agree on who owes what to whom
**Cosmos Hub (Tendermint)** prefers halting over an error: in 2023 the network froze for 30 minutes during a validator failure - safety over liveness
**Ethereum** after The Merge combines both approaches: keeps running during a fork (liveness), but after 2 epochs (~13 min) provides economic finality with billion-dollar penalties for reversal

Lamport and the Generals Who Never Were

Leslie Lamport is a scientist better known as the creator of LaTeX and Paxos. He formulated the Byzantine Generals Problem not out of interest in military history, but to describe failures in NASA computer systems: spacecraft on-board computers must keep running even if one module starts producing incorrect data. Lamport chose the Byzantine general metaphor because a "traitor" accurately describes a faulty computer's behavior: it doesn't just go silent - it actively sends false data, and different data to different recipients.

Without formalizing Byzantine fault tolerance, there would be no PBFT, no Tendermint, and no Bitcoin. Lamport gave us the language to describe the problem - and that's the first step toward solving it.

Предварительные знания

P2P networks and gossip protocols

The Byzantine Generals Problem

Five armies surround a fortress. They can only attack **simultaneously** - otherwise they lose. The generals exchange messengers, but one general is a **traitor**. He sends some generals "attack" and others "retreat". How can the honest generals reach agreement?

This is the **Byzantine Generals Problem**, formulated by Leslie Lamport in 1982. It models a fundamental challenge: how to reach consensus in a network where **participants don't trust each other** and some may **actively sabotage** the process.

In a blockchain, the "generals" are nodes, the "messengers" are network messages, and the "traitors" are malicious or compromised nodes. Every time a Bitcoin node receives a block, it solves the problem: **is this block valid, or is the sending node trying to deceive the network?**

**Lamport's result:** the problem is solvable only if traitors are **strictly fewer than 1/3** of the total. With 3 generals and 1 traitor - **no solution exists**. This is a fundamental constraint, not a flaw of any particular algorithm.

In a network of 9 nodes, what is the maximum number that can be Byzantine (malicious) while consensus is still possible?

Crash Fault vs Byzantine Fault

Not all failures are equal. A server may **crash** (stop responding) - that's a Crash Fault. Or it may **lie** (send different answers to different nodes) - that's a Byzantine Fault. The difference determines how many nodes can fail.

	Crash Fault (CFT)	Byzantine Fault (BFT)
Faulty node behavior	Silently stops responding	Can lie, send different data to different nodes
Threshold	N/2 - need more than half alive	N/3 - need more than 2/3 honest
Minimum nodes for 1 failure	3 (2f+1, f=1)	4 (3f+1, f=1)
Example algorithm	Raft, Paxos, ZooKeeper	PBFT, Tendermint, HotStuff
Where used	Inside a trusted datacenter	Blockchain, open networks

Why does the threshold differ? With a **Crash Fault**, the crashed node simply stays silent - others notice and ignore it. With a **Byzantine Fault**, the malicious node actively deceives: sending node A one value and node B another. Neutralizing such an attack requires more "votes".

**CAP theorem** (Brewer, 2000) adds another constraint. A distributed system cannot simultaneously guarantee all three properties: - **Consistency** - all nodes see the same data - **Availability** - every request gets a response - **Partition tolerance** - the system works despite network splits

In real networks **partitions (splits) are inevitable** - cables break, datacenters lose connectivity. So in practice the choice is always between **CP** (consistency under partition) and **AP** (availability under partition). Bitcoin chose AP - the network keeps running during a split, but temporary forks may occur.

A company launches a private blockchain on 10 nodes in its own datacenters (trusted environment). What type of fault tolerance is sufficient?

Finality: When a Transaction Is Irreversible

After sending 1 BTC and seeing a confirmation, can it be certain the transaction will **never** be reversed? The answer depends on the type of **finality** - the guarantee of irreversibility.

There are two different approaches: **Probabilistic finality** - the probability of reversal decreases with every new block, but theoretically never reaches zero. Bitcoin works this way. **Deterministic (absolute) finality** - once confirmed, a transaction is irreversible **mathematically**. BFT protocols (Tendermint, HotStuff) provide this guarantee.

**Why exactly 6 confirmations in Bitcoin?** With 10% hashrate, the attacker's reversal probability after 6 blocks is less than 0.00002%. This means roughly 60 minutes of waiting (6 blocks × ~10 min). For large amounts, exchanges wait 12-60 confirmations.

Property	Probabilistic (Bitcoin, Ethereum PoW)	Deterministic (Tendermint, HotStuff)
Guarantee	Reversal probability → 0, but never = 0	After 2/3 votes - mathematically irreversible
Time to finality	~60 min (6 blocks × 10 min)	~1-6 seconds
Fork possible?	Yes (temporary fork is normal)	No (fork = protocol violation)
Throughput	~7 tx/s (Bitcoin)	~1,000-10,000 tx/s
Scalability	Thousands of nodes	Tens to hundreds of nodes (communication overhead)

Real attack: double-spend on Bitcoin Gold (2018)

Probabilistic finality in practice

In May 2018, an attacker rented mining power and gained >51% of the Bitcoin Gold network hashrate. They: 1. Sent BTG to an exchange and received other crypto 2. Secretly mined an alternative chain 3. Published the longer chain - the network accepted it as canonical 4. The exchange transaction "disappeared" - a $18M double-spend Exchanges that waited for few confirmations suffered. Those requiring >20 confirmations did not.

**Ethereum** after The Merge (2022) switched to PoS with "economic finality": every 6.4 minutes (1 epoch) validators vote on blocks. After 2 epochs (~13 minutes) a block gains finality - reversing it would require destroying ETH worth billions (slashing).

A seller is accepting 5 BTC for a car. The buyer sent a transaction visible in the mempool (0 confirmations). Should the keys be handed over?

Liveness and Safety: The Impossible Tradeoff

Every consensus protocol balances two fundamental properties: **Safety**: all honest nodes agree on **the same** value. No contradictions, no fork. "Better to make no decision than a wrong one." **Liveness**: the system **keeps running** and processes new transactions. It doesn't halt or stall. "The system always makes progress."

In 1985, Fischer, Lynch, and Paterson proved **FLP impossibility** - one of the most important results in distributed systems: *In an asynchronous network (where message delivery time is not guaranteed), no deterministic consensus protocol can guarantee both safety and liveness if even one node can fail.*

**FLP impossibility** doesn't mean consensus is impossible! It means one of the requirements must be **relaxed**: allow probabilistic resolution (Bitcoin), introduce timeouts (PBFT), or assume partial synchrony (Tendermint).

Protocol	Priority	Safety	Liveness	Tradeoff cost
Bitcoin (Nakamoto)	Liveness	Probabilistic - fork possible	Always running	Wait 6 blocks for finality
Tendermint (Cosmos)	Safety	Absolute - no fork	May halt if >1/3 offline	Network freezes if too many nodes offline
Ethereum PoS (Gasper)	Both (hybrid)	Eventual finality (2 epochs)	Continues during fork	Most complex protocol, attacks at boundaries

Cosmos Hub halt (2023)

Safety > Liveness in practice

In June 2023, Cosmos Hub (Tendermint BFT) stopped for ~30 minutes due to an upgrade bug. More than 1/3 of validators crashed, and the network **stopped producing blocks**. This is Tendermint by design: better to halt (violate liveness) than allow a fork (violate safety). Nobody lost money, no double-spends. Compare to Bitcoin Gold: the network never halted, but an $18M double-spend happened - liveness was preserved, safety was violated.

**FLP impossibility** is not an abstract theorem. Every promise that "our protocol guarantees both safety and availability" hides assumptions. Usually it's "under partial synchrony" or "if fewer than 1/3 of nodes are offline".

Bitcoin doesn't solve the Byzantine Generals Problem because it allows temporary forks

Bitcoin solves the Byzantine Generals Problem **differently**: through probabilistic finality and economic incentives (PoW). A temporary fork is not a bug but a feature - the price of liveness in an open network without pre-registered participants.

Classical BFT (PBFT, Tendermint) assumes a fixed set of validators - everyone knows who participates. Bitcoin allows **anyone** to join (permissionless). For this, Satoshi replaced voting with computational work (PoW) and the longest chain rule - this is Nakamoto consensus, a new approach to solving the Byzantine Generals Problem that doesn't require a fixed set of participants.

In a Tendermint (BFT, safety-first) network of 100 validators, 35 lose connectivity. What happens?

Key Ideas

**The Byzantine Generals Problem** - the fundamental problem of reaching agreement in the presence of traitors. A solution is possible only if honest participants are **more than 2/3** (for BFT) or **more than 1/2** (for CFT)
**Crash Fault vs Byzantine Fault** - a crashed node (silent) vs a malicious node (lying). BFT is stricter: `3f+1` nodes for f failures, CFT is more lenient: `2f+1`
**Finality** - either probabilistic (Bitcoin: reversal probability → 0 with each block) or deterministic (Tendermint: irreversible after 2/3+1 quorum)
**Safety vs Liveness** - FLP impossibility: both properties cannot be guaranteed in an asynchronous network. Bitcoin chooses liveness (always running, but fork possible), Tendermint chooses safety (no fork, but may halt)
Remember the three people and the restaurant from the beginning of the lesson? Three people, one liar - and Lamport proved agreement is impossible. Every blockchain protocol is an engineering answer to that mathematical prohibition: it cannot be circumvented, only traded off

Вопросы для размышления

Bitcoin has existed for 15+ years and has never violated safety (no double-spend in the main chain). Does this mean that probabilistic finality is in practice no worse than deterministic? Or have we simply not yet encountered a sufficiently motivated attacker?
The CAP theorem forces a choice between consistency and availability during a network split. When designing an international bank transfer system - which property takes priority and why?
FLP impossibility is proven for deterministic protocols. Bitcoin circumvents this via randomization (PoW - whoever finds the hash first wins). What other approaches to circumventing FLP exist?

Связанные уроки

bc-04-p2p — Consensus is about nodes in a P2P network reaching agreement; the P2P network structure is the prerequisite
bc-14-pow — Proof of Work is the first concrete consensus mechanism; this lesson provides the abstract foundation
bc-15-pos — Proof of Stake is the second major consensus family and needs this abstract intro
bc-16-bft — BFT consensus (PBFT, Tendermint) is the classical family; the intro lesson is the prerequisite
dist-10-byzantine

Blockchain

The Consensus Problem: Why and How

**Bitcoin** processes $50B+ in transfers monthly without a single arbiter - Nakamoto consensus lets thousands of anonymous nodes agree on who owes what to whom
**Cosmos Hub (Tendermint)** prefers halting over an error: in 2023 the network froze for 30 minutes during a validator failure - safety over liveness
**Ethereum** after The Merge combines both approaches: keeps running during a fork (liveness), but after 2 epochs (~13 min) provides economic finality with billion-dollar penalties for reversal

Lamport and the Generals Who Never Were

Предварительные знания

P2P networks and gossip protocols

The Byzantine Generals Problem

In a network of 9 nodes, what is the maximum number that can be Byzantine (malicious) while consensus is still possible?

Crash Fault vs Byzantine Fault

	Crash Fault (CFT)	Byzantine Fault (BFT)
Faulty node behavior	Silently stops responding	Can lie, send different data to different nodes
Threshold	N/2 - need more than half alive	N/3 - need more than 2/3 honest
Minimum nodes for 1 failure	3 (2f+1, f=1)	4 (3f+1, f=1)
Example algorithm	Raft, Paxos, ZooKeeper	PBFT, Tendermint, HotStuff
Where used	Inside a trusted datacenter	Blockchain, open networks

A company launches a private blockchain on 10 nodes in its own datacenters (trusted environment). What type of fault tolerance is sufficient?

Finality: When a Transaction Is Irreversible

After sending 1 BTC and seeing a confirmation, can it be certain the transaction will **never** be reversed? The answer depends on the type of **finality** - the guarantee of irreversibility.

Property	Probabilistic (Bitcoin, Ethereum PoW)	Deterministic (Tendermint, HotStuff)
Guarantee	Reversal probability → 0, but never = 0	After 2/3 votes - mathematically irreversible
Time to finality	~60 min (6 blocks × 10 min)	~1-6 seconds
Fork possible?	Yes (temporary fork is normal)	No (fork = protocol violation)
Throughput	~7 tx/s (Bitcoin)	~1,000-10,000 tx/s
Scalability	Thousands of nodes	Tens to hundreds of nodes (communication overhead)

Real attack: double-spend on Bitcoin Gold (2018)

Probabilistic finality in practice

A seller is accepting 5 BTC for a car. The buyer sent a transaction visible in the mempool (0 confirmations). Should the keys be handed over?

Liveness and Safety: The Impossible Tradeoff

Protocol	Priority	Safety	Liveness	Tradeoff cost
Bitcoin (Nakamoto)	Liveness	Probabilistic - fork possible	Always running	Wait 6 blocks for finality
Tendermint (Cosmos)	Safety	Absolute - no fork	May halt if >1/3 offline	Network freezes if too many nodes offline
Ethereum PoS (Gasper)	Both (hybrid)	Eventual finality (2 epochs)	Continues during fork	Most complex protocol, attacks at boundaries

Cosmos Hub halt (2023)

Safety > Liveness in practice

Bitcoin doesn't solve the Byzantine Generals Problem because it allows temporary forks

In a Tendermint (BFT, safety-first) network of 100 validators, 35 lose connectivity. What happens?

Key Ideas

**The Byzantine Generals Problem** - the fundamental problem of reaching agreement in the presence of traitors. A solution is possible only if honest participants are **more than 2/3** (for BFT) or **more than 1/2** (for CFT)
**Crash Fault vs Byzantine Fault** - a crashed node (silent) vs a malicious node (lying). BFT is stricter: `3f+1` nodes for f failures, CFT is more lenient: `2f+1`
**Finality** - either probabilistic (Bitcoin: reversal probability → 0 with each block) or deterministic (Tendermint: irreversible after 2/3+1 quorum)
**Safety vs Liveness** - FLP impossibility: both properties cannot be guaranteed in an asynchronous network. Bitcoin chooses liveness (always running, but fork possible), Tendermint chooses safety (no fork, but may halt)
Remember the three people and the restaurant from the beginning of the lesson? Three people, one liar - and Lamport proved agreement is impossible. Every blockchain protocol is an engineering answer to that mathematical prohibition: it cannot be circumvented, only traded off

Вопросы для размышления

Bitcoin has existed for 15+ years and has never violated safety (no double-spend in the main chain). Does this mean that probabilistic finality is in practice no worse than deterministic? Or have we simply not yet encountered a sufficiently motivated attacker?
The CAP theorem forces a choice between consistency and availability during a network split. When designing an international bank transfer system - which property takes priority and why?
FLP impossibility is proven for deterministic protocols. Bitcoin circumvents this via randomization (PoW - whoever finds the hash first wins). What other approaches to circumventing FLP exist?

Связанные уроки

bc-04-p2p — Consensus is about nodes in a P2P network reaching agreement; the P2P network structure is the prerequisite
bc-14-pow — Proof of Work is the first concrete consensus mechanism; this lesson provides the abstract foundation
bc-15-pos — Proof of Stake is the second major consensus family and needs this abstract intro
bc-16-bft — BFT consensus (PBFT, Tendermint) is the classical family; the intro lesson is the prerequisite
dist-10-byzantine

The Consensus Problem: Why and How

Lamport and the Generals Who Never Were

Предварительные знания

The Byzantine Generals Problem

Crash Fault vs Byzantine Fault

Finality: When a Transaction Is Irreversible

Real attack: double-spend on Bitcoin Gold (2018)

Liveness and Safety: The Impossible Tradeoff

Cosmos Hub halt (2023)

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

The Consensus Problem: Why and How

Lamport and the Generals Who Never Were

Предварительные знания

The Byzantine Generals Problem

Crash Fault vs Byzantine Fault

Finality: When a Transaction Is Irreversible

Real attack: double-spend on Bitcoin Gold (2018)

Liveness and Safety: The Impossible Tradeoff

Cosmos Hub halt (2023)

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки