Blockchain
P2P networks and gossip protocols
October 31, 2008. Satoshi Nakamoto posts a 9-page PDF to a cryptography mailing list. No company, no server, no domain. Within months, Bitcoin nodes in dozens of countries sync a shared ledger with no coordinator. The entire trick fits in two algorithms borrowed from BitTorrent: gossip (spread everything to everyone) and Kademlia DHT (find anything in O(log N) hops). 15,000 nodes, 100+ countries, zero servers.
- **Bitcoin** - 15,000+ nodes use gossip protocol to spread transactions across the world in seconds
- **BitTorrent** - Kademlia DHT allows 200+ million users to find files without a central tracker
- **Ethereum** - the discv5 protocol (based on Kademlia) provides peer discovery for 500,000+ network validators
BitTorrent, DHT, and the Birth of Serverless Networks
In 2001 Bram Cohen built BitTorrent with centralized trackers - one server knew who had which file. When a tracker went down, the swarm died. In 2002 Petar Maymounkov and David Mazières published the Kademlia paper, describing a DHT where every node is both client and directory. By 2005, Kademlia was integrated into BitTorrent as Mainline DHT. By 2013 the Mainline DHT held 25 million nodes with no server anywhere. Ethereum borrowed the same architecture for its discv5 protocol. The gossip and DHT patterns in every modern blockchain descend directly from that 2002 paper.
Предварительные знания
Peer Discovery: how nodes find each other
When a Bitcoin node launches for the first time, it faces a problem: **who to talk to?** There is no central server, no directory of participants. Only the node and the internet. How does it find at least one neighbor?
The process of finding other nodes in a P2P network is called **Peer Discovery**. Several strategies exist, and most blockchains use a combination of them.
Analogy: moving to a new city
How Peer Discovery works in real life
A newcomer arrives in a city knowing nobody. 1. DNS Seeds = city directory: a call yields a few addresses 2. Bootstrap Nodes = city hall: a known place to go 3. Peer Exchange = word of mouth: new acquaintances introduce their own friends In the end, within a couple of hours dozens of people are known, although the start was from scratch.
**Bitcoin by default** maintains 8 outgoing and up to 117 incoming connections. A node knows about thousands of peers but is actively connected to only a few. This is a balance between network connectivity and node load.
A new Bitcoin node is launched for the first time. How does it learn the addresses of other nodes?
Gossip Protocol: word of mouth in the network
Nodes have found each other. Now the main task needs to be solved: how to **spread a new transaction** across the entire network of thousands of nodes? Sending directly to each is impossible (a node doesn't know all addresses). A mechanism is needed that works **without a central coordinator**.
**Gossip Protocol** - each node forwards new information to **several random neighbors**, who forward it further, and within a few "rounds" the information reaches everyone. Like a rumor in an office: tell three colleagues, each tells three more - in an hour the whole floor knows.
**Epidemic Broadcast:** gossip protocol is also called **epidemic broadcast**, because the spread of information is mathematically identical to the spread of a virus. The SIR model (Susceptible-Infected-Recovered) from epidemiology precisely describes how a transaction "infects" the network.
The main property of gossip protocol is **reliability**: even if 30% of nodes are down, information will still reach the rest. A few nodes didn't respond? No problem - others will "spread the news".
**Redundancy is the price of reliability.** Each node receives the same transaction multiple times from different neighbors. Bitcoin solves this with a check: "Do I already know this transaction? → Don't forward it." This turns naive gossip into **lazy push gossip** - first only the hash is sent (an `inv` message), and full data is requested on demand.
A network consists of 10,000 nodes. Each node forwards a message to 3 random neighbors. About how many rounds does it take for the message to reach everyone?
Kademlia: smart routing via XOR
Gossip is great for spreading information to **everyone**. But what if **specific data** needs to be found in a network of millions of nodes? For example: "where is the file with hash `a3f7...` stored?" Asking everyone is too slow.
**Kademlia** is a **DHT** (Distributed Hash Table) where each node is responsible for storing certain keys. The unique idea of Kademlia: the "distance" between nodes is computed via the **XOR** (exclusive OR) operation on their identifiers.
Why XOR? Three properties make it an ideal metric: 1. `d(A, A) = 0` - distance to self is zero 2. `d(A, B) = d(B, A)` - symmetry 3. `d(A, C) ≤ d(A, B) + d(B, C)` - triangle inequality Plus: XOR is computed in **one CPU instruction** - faster than any other metric.
**Ethereum uses Kademlia** (discv4 and discv5 protocols) for peer discovery. Every Ethereum node has a 256-bit Node ID, and the routing table contains 256 k-buckets of 16 contacts each. Finding any node in a network of 10,000+ nodes takes ~14 steps.
Finding data in O(log N) steps
How Kademlia finds the node responsible for a key
Node A (ID: 0000) is looking for key with ID: 1111 1. A checks its table → closest to 1111 is node B (1000) 2. A asks B: "who else is closer to 1111?" 3. B replies: "I know node C (1100)" 4. A asks C: "who else is closer?" 5. C replies: "I know node D (1110)" 6. A asks D → D knows node E (1111) 7. Node E is responsible for the key! 4-bit space → 4 steps. 256-bit (Ethereum) → maximum 256 steps, in practice ~20.
In Kademlia the XOR distance between nodes 10110001 and 10110100 is:
Network topology and attacks
The way nodes are connected to each other is called **network topology**. Topology determines how resilient, fast and secure the network is. Different blockchains choose different approaches.
Blockchain networks use **unstructured mesh** (Bitcoin) or **structured overlay** (Ethereum Kademlia). The key property - **no single point of failure**. Disconnecting 30% of nodes does not disrupt network operation.
But decentralization does not mean invulnerability. There are two classes of attacks that exploit features of P2P networks:
Eclipse Attack: isolating a victim
How an attacker can "surround" a node
An attacker controls 100 nodes. The victim is connected to 8 peers. Strategy: 1. The attacker floods the victim with connection requests 2. Gradually all 8 of the victim's connections are to the attacker's nodes 3. The victim is isolated from the real network What the attacker can do: • Show the victim a fake block chain • Hide transactions (double spend against the victim) • Delay block delivery Bitcoin defenses: • Limiting connections from a single /16 subnet • Persisting verified peers across restarts • Separating incoming and outgoing connections
Sybil Attack: an army of fakes
How creating many fake nodes threatens the network
An attacker creates 10,000 fake nodes (Sybil nodes). Goal: occupy a majority in Peer Exchange, so that new nodes connect only to the attacker. Why it works: In a P2P network there are no "passports" - anyone can create a node. If 80% of known addresses are fake, a new node will very likely end up in the "trap". Defenses: • Proof of Work: influence requires computing power, not number of nodes • Proof of Stake: influence requires staked ETH, fake nodes accomplish nothing • Reputation systems: trust is earned over time
| Attack | Mechanism | Target | Defense |
|---|---|---|---|
| Eclipse | Surround victim with controlled nodes | Isolate one node | Limit connections from a single subnet |
| Sybil | Create many fake nodes | Control routing | Proof of Work / Proof of Stake |
| Routing | BGP hijacking of traffic between nodes | Split the network | P2P traffic encryption, Tor |
**Practical takeaway:** this is exactly why running an independent node matters rather than relying on a third party. Connecting through someone else's node (e.g., through Infura) is effectively **voluntarily entering an Eclipse** - one intermediary controls all information about the network.
A P2P network is fully decentralized - all nodes are absolutely equal and equally important
In practice, nodes are not equal: mining pools control most of the hashrate, Infura serves ~50% of Ethereum requests, and a few ISPs (Internet Service Providers) route most P2P traffic. This creates "bottlenecks" of centralization within a formally decentralized architecture.
P2P protocols provide the technical possibility of decentralization, but economics and infrastructure create a natural tendency toward centralization. Understanding this gap between protocol and reality is key to assessing the true resilience of blockchain networks.
An Eclipse Attack on a Bitcoin node allows the attacker to:
Key ideas
- **Peer Discovery** - new nodes find the network via DNS Seeds, bootstrap nodes and Peer Exchange (PEX), gradually expanding the contact list
- **Gossip Protocol** - each node forwards information to several neighbors, reaching the entire network in O(log N) rounds - like an epidemic
- **Kademlia DHT** - XOR distance metric and k-buckets allow finding any key in the network in O(log N) steps, used by Ethereum for peer discovery
- **Mesh topology** ensures network resilience, but **Eclipse** and **Sybil attacks** remind us: a P2P network is not absolute decentralization. This is exactly why 15,000 Bitcoin nodes operate without a single server, but each must know how to protect itself from being surrounded by attackers
Related topics
P2P networks are the transport layer of blockchain, connecting cryptography with consensus:
- What is blockchain — A distributed ledger requires a P2P network for synchronization between nodes
- Consensus — Gossip delivers transactions and blocks, and consensus determines which ones to accept
- Digital signatures — Signatures protect transactions, but the P2P network determines how they spread
Вопросы для размышления
- If DNS Seeds are controlled by specific people (e.g., Bitcoin Core developers), isn't this a hidden point of centralization? How can this risk be mitigated?
- Why did Ethereum choose Kademlia (structured overlay) for peer discovery, while Bitcoin manages with simple random gossip? What is the trade-off between these approaches?
- When designing a P2P network for a new blockchain, how would Eclipse Attacks be defended against without sacrificing the network's openness to new nodes?
Связанные уроки
- bc-01-intro — A distributed ledger needs P2P transport to sync between nodes
- bc-05-consensus-intro — Gossip delivers blocks and transactions; consensus decides which ones to accept
- bc-06-digital-signatures — Signatures protect transactions, but P2P defines how they propagate
- ds-09-gossip-protocols — Gossip in distributed systems (Cassandra, Dynamo) uses the same O(log N) math, different context
- ds-12-service-discovery — Service discovery in microservices solves the same find-nodes problem but in a trusted, controlled environment
- net-28-dynamic-routing