Blockchain
HotStuff and Linear BFT
PBFT - the first practical BFT algorithm - works with 20 nodes. With 100 nodes it generates 10,000 messages per phase, and when a faulty leader needs replacing, it launches a separate even heavier protocol. In 2018, the VMware Research team (commissioned by Facebook for the Libra project) asked: can BFT be built with linear communication, where leader replacement costs as much as a normal round? The result - HotStuff: a protocol that scales to hundreds of validators and adapts to real network speed.
- **Aptos** (successor to Meta Diem) uses AptosBFT based on HotStuff - 160,000+ tx/s with 100 validators and finality under 1 second
- **Flow** (blockchain by Dapper Labs, creators of CryptoKitties and NBA Top Shot) is built on a HotStuff variant with separated roles between nodes
- **Sui** (from Mysten Labs, former Diem team) uses Narwhal + Bullshark - a DAG extension of HotStuff ideas for parallel transaction ordering
Предварительные знания
Three-phase voting and chained QC
In classical PBFT, a block goes through three phases: **pre-prepare**, **prepare**, **commit**. HotStuff reimagines this process: each phase ends by forming a **Quorum Certificate (QC)** - a cryptographic proof that more than 2/3 of validators voted "yes".
The three HotStuff phases: 1. **Prepare** - the leader proposes a block, validators vote. Result: `prepareQC` - proof that the block is under consideration. 2. **Pre-commit** - the leader broadcasts `prepareQC`, validators confirm. Result: `precommitQC` - the block is "locked"; validators will not vote for a conflicting block in this round. 3. **Commit** - the leader broadcasts `precommitQC`, validators finalize. Result: `commitQC` - the block is irreversibly committed.
Why can't two phases suffice? Imagine: the leader of round 5 proposed block B, validators voted in the Prepare phase. The leader crashed; a new leader arrives for round 6. He doesn't know how many validators have already "locked" on block B. If he proposes a conflicting block B', part of the network may finalize B and part may finalize B'. The **Pre-commit phase** guarantees that before finalization all honest nodes explicitly confirmed the lock - and the new leader can verify this.
**Chained QC** is the key innovation. Each QC references the previous one, forming a chain: `prepareQC → precommitQC → commitQC`. This allows the new leader to reconstruct the full picture just by obtaining the latest QC - no need to re-poll all validators.
Why does HotStuff use three voting phases instead of two?
Leader, O(n), and threshold signatures
The main pain point of PBFT is **quadratic communication complexity**. In the prepare phase, every validator sends its message to **every** other validator. With 100 validators that's `100 × 99 = 9,900` messages per phase. With 1000 validators - nearly **a million**. This is exactly why PBFT doesn't scale beyond ~20 nodes in practice.
In HotStuff, communication follows a "star" topology: the round **leader** is the central node. Validators send their votes **only to the leader**; the leader aggregates them into a QC and broadcasts the result back. Instead of `O(n²)` messages - `O(n)`: n votes to the leader + 1 broadcast from the leader.
But there is a problem: if every validator sends a separate signature, the QC will contain n signatures. Verifying n signatures is again `O(n)` per node. The solution: **threshold signatures**.
Facebook/Meta Diem (Libra): HotStuff in production
The first industrial application of HotStuff
In 2019, Facebook announced Libra (later renamed Diem) - a blockchain-based payment system. For consensus, the VMware Research team created HotStuff and published a research paper. Why HotStuff? - **100 validators** (partner organizations) - PBFT with O(n²) would work, but at its limit - **Linear communication** - scales to hundreds of nodes - **Simple view change** - leader replacement in one round, without PBFT's complex view change Diem was shut down in 2022 due to regulatory pressure, but the technology lives on: **Aptos** uses the derived protocol AptosBFT (formerly DiemBFT v4), processing ~160,000 tx/s.
**View change** - replacing the leader on failure - is HotStuff's main advantage over PBFT. In PBFT, view change is a separate, heavy protocol with `O(n²)` messages. In HotStuff, view change is **built into** a normal round: the new leader simply starts a new round, collecting the latest QC from validators. Cost: `O(n)` - like a regular round.
A HotStuff network with 200 validators. How many messages are sent in one voting phase?
Pipelining: overlapping phases
Three phases per block means three communication rounds. With 100 ms latency between nodes, each block finalizes in ~300 ms. Can we do faster? Yes - through **pipelining**: overlapping phases of different blocks in the same round.
The pipelining idea: when the leader proposes a new block `B_{k+1}`, this block simultaneously serves as a **pre-commit vote for block `B_k`**. The next block `B_{k+2}` is a commit vote for `B_k` and a pre-commit vote for `B_{k+1}`. Each new block advances the previous blocks through the pipeline.
In Chained HotStuff the leader **rotates every round**. The round-k leader proposes a block, collects votes, and forms a QC. The round-(k+1) leader includes this QC in their new block - thereby "pushing" previous blocks through the pipeline toward finalization.
**The cost of pipelining**: finalizing a block still requires 3 rounds (three-chain), but **throughput** increases: once the pipeline is "warmed up", every round finalizes one block. The latency of an individual block hasn't decreased, but overall throughput has tripled.
In Chained HotStuff, the round-10 leader proposes block B₁₀. Which block is finalized at this moment?
Optimistic responsiveness
Classical BFT protocols are tied to **timeouts**. A validator waits a fixed amount of time (e.g., 5 seconds), and if it hasn't received a message - it considers the leader faulty. Problem: the timeout must be set conservatively because network behavior is unpredictable. Result: the protocol runs at the speed of the timeout, not the speed of the network.
HotStuff introduces the **optimistic responsiveness** property: with an honest leader, the protocol's speed is determined by the **actual network delay** (δ), not a pre-set timeout (Δ). If messages arrive in 50 ms - a block finalizes in ~150 ms (3 × 50 ms). If in 10 ms - in 30 ms.
| Property | PBFT | Tendermint | HotStuff |
|---|---|---|---|
| Communication (normal case) | O(n²) | O(n²) | O(n) |
| Communication (view change) | O(n³) | O(n²) | O(n) |
| Responsiveness | No | No (timeout-based) | Yes (optimistic) |
| Phases to commit | 3 | 2 (but with pre-vote) | 3 |
| View change | Separate heavy protocol | Built-in, but timeout-bound | Built into normal round |
| Scalability (nodes) | ~20 | ~100-200 | ~500+ |
| Example | Hyperledger Fabric (early versions) | Cosmos Hub, Polygon PoS | Aptos, Sui, Flow |
**Aptos** (successor to Diem) uses AptosBFT - an optimized HotStuff. In 2024 benchmarks, Aptos showed **160,000+ tx/s** with 100 validators and finality of ~0.9 seconds. For comparison: Tendermint (Cosmos Hub) - ~10,000 tx/s with ~6-second finality.
HotStuff is faster than PBFT because it has fewer voting phases
HotStuff uses the same number of phases (three) as PBFT. Its advantage lies in **linear communication** O(n) instead of O(n²), **simple view change** (also O(n), not a separate O(n³) protocol), and **optimistic responsiveness** (network speed, not timeout). The number of phases is the same - what differs is the cost of each phase.
The number of phases is determined by the theoretical safety requirement for leader change (three-chain for safe view change). HotStuff's speedup comes not from simplifying the protocol, but from optimizing the communication pattern: star topology instead of all-to-all, plus threshold signatures for compact QCs.
A HotStuff network with 100 validators. The network is fast: messages arrive in 20 ms. Timeout is set to 3 seconds. The leader is honest. How long does it take for a block to finalize?
Key ideas
- **Three-phase voting** with chained QC: Prepare → Pre-commit → Commit. The Pre-commit phase solves the safe leader-change problem that two-phase protocols cannot
- **Leader-centric star** instead of all-to-all: `O(n)` communication instead of PBFT's `O(n²)`. Threshold signatures (BLS) aggregate n signatures into one, making the QC compact and fast to verify
- **Pipelining**: three blocks in flight simultaneously. Each new block is a vote for the previous one. Throughput: 1 finalization per round in steady state
- **Optimistic responsiveness**: with an honest leader, speed = actual network latency (δ), not timeout (Δ). View change costs O(n) - same as a regular round, not a separate heavy protocol
- Remember the 10,000 PBFT messages with 100 nodes from the intro? HotStuff compresses this to ~200 messages per phase, and makes leader replacement as cheap as a regular round. This is exactly why projects like Aptos and Flow chose HotStuff over PBFT
Related topics
HotStuff is the evolution of BFT consensus, standing between classical PBFT and modern hybrid protocols:
- BFT Consensus: PBFT and variants — HotStuff solves PBFT's main problem - quadratic communication and heavy view change - while preserving BFT safety and liveness guarantees
- Tendermint and Cosmos consensus — Tendermint is a timeout-based BFT with two phases. HotStuff adds a third phase for safe view change and optimistic responsiveness instead of timeout-binding
- Gasper: Ethereum 2.0 consensus — Gasper combines Casper FFG (finality) and LMD-GHOST (fork choice) - a hybrid approach, while HotStuff is pure BFT with linear communication
Вопросы для размышления
- HotStuff relies on the leader to aggregate votes. If the leader is a single point of failure, doesn't this make the protocol vulnerable to targeted DDoS attacks on the current leader? How can this be defended against?
- Aptos claims 160,000 tx/s while Cosmos Hub achieves about 10,000 tx/s. How much of this difference is due to consensus choice (HotStuff vs Tendermint), and how much to other factors (execution engine, parallelism, hardware requirements)?
- DAG-based protocols (Narwhal, Bullshark) extend HotStuff's ideas by removing the single leader. If you can manage without a leader - why did HotStuff introduce one? What trade-off do DAG protocols make?