Blockchain

BFT Consensus: PBFT and variants

Imagine: you are a commander-in-chief, and your generals must attack simultaneously. But communication only goes through messengers, messengers can be intercepted, and some generals are traitors. One will send you a false confirmation; another will intentionally attack too early. How do you coordinate an army when you don't know who the enemy is? Lamport, Shostak, and Pease formalized this problem in 1982. For seventeen years, scientists considered it unsolvable for practical systems. Then in 1999, Castro and Liskov showed: 3f+1 nodes and three rounds of messages are enough to guarantee agreement even with traitors present.

  • **Hyperledger Fabric** - IBM's enterprise blockchain, which in early versions used PBFT-like consensus between member organizations (banks, logistics, healthcare)
  • **Tendermint (Cosmos)** - a BFT engine powering 50+ blockchains (Cosmos Hub, Binance Chain, Terra). Every block is final immediately - no waiting for confirmations
  • **HotStuff (Diem/Libra)** - Facebook/Meta's protocol for their payment system. Linear complexity allowed scaling to hundreds of validators. Now used in Aptos
  • **Aviation and space** - Boeing 787 and SpaceX Dragon use BFT-like protocols to reconcile readings between redundant onboard computers

Castro & Liskov: BFT leaves the lab

In 1999, PhD student Miguel Castro and his advisor Barbara Liskov of MIT published "Practical Byzantine Fault Tolerance". Before this, BFT protocols existed only in theory - they required an exponential number of messages. Castro and Liskov showed that BFT could be implemented with polynomial complexity O(n²), sufficient for dozens of nodes. Liskov later received the Turing Award (2008), and while her main contribution is the Liskov Substitution Principle (LSP), PBFT became one of the most cited papers in distributed systems.

0

1

Sign In

Предварительные знания

  • The Consensus Problem: Why and How

Practical BFT: the problem and the solution

In 1982, Lamport, Shostak, and Pease formalized the **Byzantine Generals Problem**: how can several participants reach a single decision if some of them may lie, stay silent, or send contradictory messages?

**Byzantine Fault** is the strongest type of failure: a node can behave in an arbitrarily malicious way. This is harder to handle than a crash fault (the node simply goes offline).

For a long time, BFT protocols were considered impractical due to exponential complexity. In 1999, **Miguel Castro and Barbara Liskov** (MIT) published **PBFT** - the first BFT protocol suitable for real systems.

The main difference between PBFT and Nakamoto consensus (PoW/PoS): **deterministic finality**. Once a transaction is confirmed - it is confirmed forever. No "wait for 6 confirmations to be sure".

PropertyPBFTNakamoto (PoW)
FinalityInstantProbabilistic
Fault tolerancef < n/3 Byzantine< 50% hashrate
ParticipantsKnown in advance (permissioned)Anyone (permissionless)
EnergyMinimalEnormous
ScaleTens of nodesThousands of nodes

**Hyperledger Fabric** used a PBFT-like consensus in early versions. Modern permissioned blockchains (Hyperledger Sawtooth, Zilliqa) still rely on the BFT family.

A network of 10 nodes uses PBFT. What is the maximum number of Byzantine nodes it can tolerate?

The three phases of PBFT: Pre-Prepare, Prepare, Commit

PBFT operates in **three phases**. This is the minimum required to reach agreement in the presence of Byzantine nodes. Fewer than three phases leaves the protocol vulnerable.

Let's break down each phase:

  1. **Pre-Prepare**: the leader (primary) receives a request from the client, assigns a sequence number, and broadcasts `<PRE-PREPARE, v, n, d>` to all replicas. Replicas verify: is the view number correct, has the sequence number been used before, does the digest match?
  2. **Prepare**: each replica that accepts the pre-prepare sends `<PREPARE, v, n, d, i>` to all others. The node waits for **2f** matching prepare messages (plus its own). This is a guarantee that the majority of honest nodes agree on the ordering.
  3. **Commit**: after receiving 2f prepare messages, the node sends `<COMMIT, v, n, d, i>` to all. It waits for **2f+1** commit messages. After that - executes the operation and sends the result to the client.

**Why can't two phases suffice?** Two phases are enough for crash faults (Paxos), but not for Byzantine faults. Without a commit phase, a malicious leader can send different pre-prepares to different replicas, and they won't be able to detect this before making a decision.

**Quorum of 2f+1** guarantees quorum intersection. Any two quorums of size 2f+1 out of n=3f+1 nodes intersect in at least f+1 nodes. At least one of those is honest. This is what ensures safety.

In PBFT with n=7 and f=2, how many prepare messages must a replica receive before moving to the commit phase?

View Change: replacing the leader

What if the leader (primary) is malicious? It can delay requests, send contradictory messages, or simply crash. PBFT handles this through the **view change** mechanism - replacing the leader.

**View** is the current epoch number. Each view has exactly one leader: `primary = v mod n`, where v is the view number and n is the number of nodes. When the view changes, the leader changes deterministically.

How view change works:

  1. **Timeout**: a replica has not received a response from the leader within the allotted time. It suspects the leader of failure and sends `<VIEW-CHANGE, v+1, ...>` to all nodes.
  2. **Evidence collection**: the view-change message contains the last stable checkpoint and all prepared certificates (proof that messages passed the prepare phase). This ensures the new leader does not lose already-agreed requests.
  3. **New leader**: the node that is primary for view v+1 collects 2f+1 valid view-change messages and forms `<NEW-VIEW, v+1, V, O>`, where V is the set of view-change messages and O is the pre-prepare messages for unfinished requests.
  4. **Resumption**: all replicas verify the correctness of the new-view message, switch to view v+1, and resume processing.

**Safety during view change**: the new leader must include all requests that were already "prepared" in the previous view. Without this, a view change could roll back already-agreed transactions.

**Protection against a malicious new leader**: if the new leader is also Byzantine, the timer triggers again and another view change occurs. Since f < n/3, a honest leader is guaranteed to be elected within at most f+1 view changes.

**Exponential backoff**: each subsequent timer doubles to avoid infinite view changes. A **catch-up** mechanism is also used: if 2f+1 nodes are already in the new view, a lagging node syncs up.

Why does a view-change message contain prepared certificates from the previous view?

Communication Complexity and BFT scaling

PBFT solved the BFT problem for practical systems, but it has a fundamental bottleneck - **the number of messages grows quadratically**.

This is exactly why classic PBFT is practically unusable with n > 20-30 nodes. Blockchains with hundreds of validators (Ethereum: ~900,000 validators) need different approaches.

**Threshold signatures** are the key optimization. Instead of every node sending a message to every other node, signature aggregation is used:

Comparison of BFT protocols by generation:

ProtocolYearComplexityPhasesUsed in
PBFT1999O(n²)3Hyperledger (early versions)
Tendermint2014O(n²)3Cosmos, Binance Chain
HotStuff2018O(n)3 (pipelined)Diem (Libra), Flow
SBFT2019O(n)2 (fast path)JPMorgan Quorum
Narwhal+Tusk2022O(n)DAG-basedSui, Aptos (variant)

**HotStuff** was a breakthrough in 2018. Three key improvements: 1. linear communication complexity via threshold signatures 2. pipelined phases - each new block starts prepare for itself and finishes commit for the previous one 3. simple view change (the hardest part of PBFT) in O(n).

**Trade-off**: reducing complexity to O(n) typically requires trusting the leader for one round (optimistic) or using cryptographic primitives (threshold signatures, BLS aggregation) that are computationally more expensive than plain signatures.

BFT protocols are useless for blockchain because they require knowing all participants and don't scale

Modern BFT protocols (HotStuff, Tendermint) are the foundation of most PoS blockchains. They operate in the permissioned layer (known validators), while entry into the validator set is managed through staking (permissionless). Optimizations (threshold signatures, pipelining) solved the scaling problem for hundreds of nodes.

The confusion arises from conflating PBFT (1999, a research protocol) with modern BFT protocols. PBFT indeed does not scale, but its descendants - HotStuff (Diem), Tendermint (Cosmos), Gasper (Ethereum) - run in production with hundreds and thousands of validators.

Why is PBFT unsuitable for public blockchains with thousands of validators?

Key ideas

  • **PBFT** - the first practical BFT protocol (1999). Tolerates f < n/3 Byzantine nodes and provides **deterministic finality** - remember the generals from the intro? Three rounds of messages are enough for the army to act consistently even when a third of the generals are traitors
  • **Three phases** (pre-prepare → prepare → commit) with a quorum of 2f+1 guarantee both **safety** (no contradictory decisions) and **liveness** (the system makes progress)
  • **View change** - the mechanism for replacing a malicious or failed leader. Prepared certificates preserve already-agreed decisions when the view changes
  • **O(n²) messages** - PBFT's fundamental bottleneck. Threshold signatures (BLS aggregation) reduce complexity to **O(n)**, enabling BFT to scale to hundreds of validators
  • **Evolution**: PBFT (1999) → Tendermint (2014) → HotStuff (2018) - each generation solved the previous one's problem. From "works with 4 nodes in a lab" to "works with hundreds of nodes in production"

Related topics

PBFT is the foundation of the entire BFT consensus family. Understanding the three phases and view change is essential for studying modern protocols:

  • Introduction to consensus — Basic concepts: safety, liveness, fault tolerance. PBFT makes these abstractions concrete for the Byzantine model
  • Tendermint — Direct descendant of PBFT: simplified view change, PoS integration. Keeps O(n²) but adds accountability (slashing)
  • HotStuff — The next generation: linear complexity via threshold signatures, pipelined view change. Solved PBFT's main problem

Вопросы для размышления

  • Why is the BFT fault tolerance threshold (f < n/3) stricter than Nakamoto consensus (f < n/2)? What do we gain in exchange for this higher cost?
  • If all 3f+1 nodes are honest, PBFT still requires three phases. Can the protocol be optimized for the "optimistic" case where there are no Byzantine nodes? (Hint: look at SBFT and fast path)
  • Imagine you are designing consensus for a network of 1000 nodes. PBFT fails due to O(n²). HotStuff reduces this to O(n) via threshold signatures. But threshold signatures are computationally expensive. What other architectural choices could help? (Hint: committee selection, validator sharding)

Связанные уроки

  • bc-15-pos — PoS uses BFT as its internal consensus
  • bc-05-consensus-intro — Consensus introduction - mandatory context
  • bc-17-nakamoto-vs-classical — Nakamoto vs BFT - next comparative lesson
  • bc-18-tendermint — Tendermint is a practical BFT implementation
  • ds-03-consensus — Distributed consensus is the mathematical base of BFT
  • prob-02-combinatorics — 1/3 byzantine bound - combinatorial proof
  • dist-10-byzantine
BFT Consensus: PBFT and variants