Blockchain

Merkle Trees: the tree of trust

Цели урока

  • Explain how a Merkle Tree folds thousands of transactions into one root hash
  • Build a Merkle Proof and understand why its size is O(log N)
  • Compare a full node and an SPV client by data volume and security level
  • Name applications of Merkle Trees beyond blockchain

Предварительные знания

  • Hash Functions: the foundation of blockchain

A Bitcoin block holds ~2500 transactions. Ethereum processes 1+ million transactions per day. A mobile wallet stores 60 MB instead of 500 GB and cryptographically verifies every transaction. No trust needed - just proof. The trick: a data structure patented in 1979 - 30 years before Bitcoin. The reviewers back then could not see the point.

  • **Bitcoin SPV** - mobile wallets verify transactions via Merkle Proof, storing only 60 MB instead of 500+ GB
  • **Git** - every commit contains a Merkle Tree of project files. `git diff` works efficiently precisely because of this structure
  • **Amazon DynamoDB** - uses Merkle Trees for replica synchronization (anti-entropy protocol)

Ralph Merkle - more than a tree

Ralph Merkle published the hash tree idea in his Stanford doctoral dissertation. It was part of broader work on **public-key cryptography** - Merkle independently of Diffie and Hellman proposed a key exchange protocol (Merkle's Puzzles). The first scientific paper got **rejected** by reviewers - they could not grasp the idea. Merkle later became one of the pioneers of nanotechnology and cryonics.

Without Merkle Trees, blockchain would be impractical: every transaction verification would require downloading the entire block.

Merkle Root: one hash for all data

1979. Ralph Merkle defends his dissertation at Stanford. Reviewers reject the paper - the idea goes over their heads. 2009. Satoshi Nakamoto bakes the Merkle tree into Bitcoin. 2024. Mobile wallets verify transactions on a phone without pulling 500 GB of blockchain. An idea nobody understood in 1979 became the backbone of the digital economy.

SHA-256 produces a fingerprint of data. But how does one fingerprint **thousands of transactions** at once while keeping **each one individually verifiable**? A **Merkle Tree** folds any number of elements into **one hash** - the **Merkle Root**. That root hash goes into the block header. 80 bytes wrap around thousands of transactions.

Built bottom-up: 1. Hash each transaction - leaves 2. Hash pairs of leaves - next level 3. Repeat until only **one hash** is left - the root

**In Bitcoin** every block carries the Merkle Root of all its transactions. The block header is only **80 bytes**, yet through that single root it covers thousands of transactions.

A block contains 2048 transactions. How many levels will the Merkle Tree have?

Proof of Inclusion: verification without downloading

The Merkle Tree's main superpower: **prove** that a specific transaction sits inside a block **without revealing the other transactions** and without downloading the entire block. Like proving a particular tree exists in a forest by walking the path to it - no clear-cutting required.

A **Merkle Proof** (or Merkle Path) is the set of sibling hashes along the path from the leaf to the root. Given the proof, anyone can recompute the Merkle Root independently and check it against the one in the block header.

**Efficiency:** for a block with 4096 transactions, a Merkle Proof is **12 hashes** (log₂(4096) = 12). That is 12 * 32 = **384 bytes** instead of pulling the entire block (~1-2 MB).

A block contains 1,000,000 transactions. How many hashes are needed in a Merkle Proof to verify one transaction?

Binary hash tree: why a tree?

Why a **binary tree** instead of simply concatenating and hashing all transactions? `SHA256(Tx1 + Tx2 + ... + TxN)` - one hash, done. The catch: with naive hashing, proving inclusion of a single transaction without revealing all the others is **impossible**. Verification means pulling **every transaction**.

ApproachProof sizeVerifying one Tx
Naive hash of all TxsO(N) - all transactionsDownload the entire block
Merkle TreeO(log N) - path from leaf to root~20 hashes for 1M transactions
No hashO(N) - all transactionsDownload entire block + verify signatures

**Why binary?** Each node combines exactly two child hashes. The payoff: - **Minimum tree height**: $\log_2 N$ - **Simple navigation**: at each level pick left or right - **Efficient updates**: swapping one transaction touches only $\log_2 N$ hashes

Updating a transaction in a Merkle Tree

What happens when a single leaf changes

Say Tx C changed. Three things must be recomputed: 1. H(C) - new leaf hash 2. Hash(CD) - parent changed 3. Merkle Root - root changed Total: 3 recomputations for a tree with 4 leaves. For a tree with 1,000,000 leaves: ~20 recomputations. The rest of the structure stays untouched.

**Variations:** beyond binary trees there are **Patricia Tries** (Ethereum stores its state in a Modified Merkle Patricia Trie) and **Verkle Trees** (in the pipeline for Ethereum 2.0, even more compact).

Why is a Merkle Tree better than naive hashing of all transactions (SHA256(Tx1+Tx2+...+TxN))?

Light Clients (SPV)

A full Bitcoin node stores the **entire** blockchain history - over **500 GB**. This is where Merkle Trees turn theory into a real product. **SPV (Simplified Payment Verification)** - introduced in Satoshi Nakamoto's original Bitcoin paper. An SPV client pulls only **block headers** (80 bytes each) and requests Merkle Proofs for the transactions it cares about.

How SPV works: 1. Pulls all block headers (each carries a Merkle Root) 2. When a transaction needs verification - asks a full node for a Merkle Proof 3. Verifies the proof independently - walks the path up to the Root 4. Matches against the Merkle Root in the header - match means the transaction is valid

Ethereum and light clients

How Merkle Trees are used beyond Bitcoin

Ethereum stores in its Merkle Trie not just transactions, but also: - State Trie - balances and data of all accounts - Storage Trie - smart contract data - Receipts Trie - transaction execution results An Ethereum light client can request: "What is the balance of address 0xABC?" with a Merkle Proof, without downloading the entire state database (~200 GB).

**SPV limitation:** a light client must trust the full node not to hide data. SPV verifies transaction inclusion but cannot guarantee the node showed **every** transaction. Full security demands a full node.

Merkle Trees are only needed for blockchain

Merkle Trees are used in Git (object trees), BitTorrent (chunk verification), Amazon DynamoDB (anti-entropy), ZFS (data integrity), Certificate Transparency (SSL certificates) and many other systems.

A Merkle Tree is a universal data structure for efficient verification. Ralph Merkle patented it in 1979 - 30 years before Bitcoin. Blockchain merely made it famous.

Why can a mobile Bitcoin wallet work without downloading 500 GB of blockchain?

Key Ideas

  • **Merkle Root** - a single hash that embraces all transactions in a block via a binary hash tree. The block header is 80 bytes - but the root knows about thousands of transactions
  • **Merkle Proof** - a set of ~20 hashes to verify one transaction among a million (O(log N)). This is exactly what lets a mobile wallet work without hundreds of gigabytes of data
  • **Binary tree** gives logarithmic proof complexity and efficient updates when a transaction changes
  • **SPV clients** download only block headers (~60 MB) and use Merkle Proofs for verification. 500 GB -> 60 MB - an 8000x difference

Related topics

Merkle Trees connect cryptography with efficient data structures:

  • Hash functions — SHA-256 is the building block of every node in a Merkle Tree
  • P2P networks — Merkle Trees allow verifying data from untrusted peers
  • Ethereum State — Modified Merkle Patricia Trie stores the entire Ethereum state

Вопросы для размышления

  • If Merkle Trees are so efficient, why do not all databases use them?
  • What happens when there is an odd number of transactions in the tree - how is a binary tree built?
  • How would a Merkle Tree be used outside blockchain - for example, to synchronize files between two computers?

Связанные уроки

  • bc-02-hashing — SHA-256 is the building block of every Merkle Tree node
  • bc-04-p2p — P2P networks use Merkle Proof to verify data from untrusted peers
  • alg-08 — Trees as data structures - common foundation for understanding Merkle Tree
  • alg-02 — O(log N) search in binary trees - same logic as Merkle Proof
  • ds-14 — Distributed storage uses Merkle Trees for anti-entropy synchronization
  • sec-03 — Certificate Transparency Log is built on Merkle Trees
  • crypto-19-hash-functions
  • alg-12-bfs
Merkle Trees: the tree of trust

0

1

Sign In