Blockchain
Nodes: Full, Light, Archive
When you call eth_getBalance in your dApp, the answer comes back in 200 milliseconds. But behind that simple number lies a question that defines the entire architecture of the blockchain: who exactly computed that balance? An Infura server you trust? Your own node that verified every block? Or a light client on a phone that verified a Merkle proof? The answer depends on the type of node - and on how much sovereignty you are willing to trade for convenience.
- **Ethereum** - more than 6,000 full nodes independently validate every transaction, ensuring network decentralization. Client diversity (Geth, Nethermind, Erigon, Besu, Reth) protects against catastrophic bugs in any single client
- **Alchemy, Infura, QuickNode** - RPC providers running hundreds of archive nodes and handling billions of requests per day. More than 80% of dApps connect to the blockchain through these intermediaries, not through their own nodes
- **Helios (a16z)** - an Ethereum light client that allows verifying RPC responses on a regular laptop or smartphone using only ~60 MB instead of 500 GB. A compromise between the full sovereignty of a full node and blind trust in a provider
Предварительные знания
Full Node: sovereignty through verification
When you send a transaction through MetaMask, who checks that it is valid? Who guarantees that the sender's balance is sufficient, that the signature is valid, that the nonce is correct? If you rely on someone else's server - you are **trusting**. But the whole idea of blockchain is to **verify**, not trust.
A **full node** is a network node that independently validates every block and every transaction according to the protocol rules. It stores the **current state** - balances of all accounts, smart contract code, storage variables - and applies new blocks to it, checking every step.
The key word is **pruned**. A full node in pruned mode stores the current state and recent history, but deletes old state tries. You can verify any **new** transaction, but cannot answer the question "what was Vitalik.eth's balance at block 15,000,000?" - that data has already been deleted.
Analogy: your own accountant vs. someone else's
Why run your own full node
Imagine you have a large business. Option 1 (someone else's node): you call an outside accountant and ask: "How much is in my account?" They might make a mistake, lie, or be unavailable. Option 2 (your own full node): you have your own accountant who personally verifies EVERY document. No one can deceive you. Three reasons to run your own node: 1. Sovereignty - you verify the rules yourself 2. Privacy - no one sees your queries 3. Censorship resistance - no one can block your access to the network
**Minimum requirements for an Ethereum full node (2025):** 4-core CPU, 16 GB RAM, 2 TB NVMe SSD (SATA won't work - too slow for state access), stable internet at 25+ Mbit/s. Hardware cost: ~$500-700. This is not server-grade hardware - an ordinary desktop or even a mini-PC like an Intel NUC will work.
Why can't a full node in pruned mode answer the question "what was the balance of address 0xABC at block 10,000,000"?
Light Client: verification without terabytes
500 GB on disk, 16 GB RAM, a constant internet connection - a full node is not accessible to everyone. What should a smartphone user do if they want to verify their transaction without trusting Infura? The answer is a **light client**: a node that verifies data without downloading the entire chain.
A light client stores only **block headers** - about ~500 bytes per block instead of ~100 KB for a full block. For Ethereum, this is ~60 MB instead of 500 GB. When it needs to verify specific data (balance, transaction, receipt), it requests a **Merkle proof** from a full node and verifies it locally.
**Helios** - an Ethereum light client from a16z, written in Rust. It connects to the consensus layer, receives block headers via the sync committee (512 validators, rotating every ~27 hours), and can verify any RPC request. Helios converts **trust in Infura** into **trust in 512 validators with collateral**.
**Portal Network** - an experimental Ethereum Foundation project that goes even further: instead of a single full node, a light client gets data from a **peer network** via DHT (similar to BitTorrent). Each peer stores a small fragment of data, and the light client assembles the needed fragments and verifies them through proofs. This eliminates the need for a trusted full-node server.
**Trust tradeoff:** full node = zero trust (you verify everything yourself). Light client = minimal trust (you trust a sync committee of 512 validators with $16M+ at stake, but you verify Merkle proofs). RPC provider without verification = full trust in one company. A light client is a pragmatic compromise for mobile devices and applications where a full node is not feasible.
A light client requests an account balance from a full node. How does it verify that the response is correct?
Archive Node: all of history on one disk
A full node stores the current state, a light client stores only headers. But what if you need to know how much ETH was at the Tornado Cash address **at the moment of OFAC sanctions** (block 15,300,000, August 2022)? Or debug a smart contract that crashed on a specific block six months ago? For that you need a node that remembers **everything**: every state, at every block, since genesis.
An **archive node** stores the full network state at **every block**: all intermediate state tries, all historical balances, all smart contract storage at every point in time. It is the complete "time machine" of the blockchain.
Different clients organize archive storage differently. **Geth** in archive mode stores every state trie in full - this is redundant and consumes disk space. **Erigon** uses a flat state model: instead of trees, it uses flat tables (key → value at each block), which is significantly more compact. An Erigon archive node takes ~3 TB compared to ~15+ TB for Geth.
Analogy: library vs. newsstand
The difference between a full node and an archive node
Full node (pruned) - a newsstand: you have today's newspaper and maybe a few from yesterday. You know the current news, but can't find out what was on the front page 3 years ago. Archive node - a national library: stores EVERY issue of EVERY newspaper from day one. You can find any article from any date. But it requires a huge building (15+ TB of disk). RPC providers (Alchemy, Infura) - a delivery service: they have the library, and you can request any old newspaper. But you're trusting the courier to bring you the real paper, not a fake.
**Most developers don't run archive nodes themselves.** Instead they use RPC providers: Alchemy (free tier: 300M compute units/month), Infura, QuickNode, Ankr. These companies run dozens of archive nodes and sell API access to them. The tradeoff: convenience and speed in exchange for trusting the provider.
A DeFi analyst wants to build a chart of TVL (Total Value Locked) changes in Aave over the last 2 years. What type of node do they need?
State Sync: how a node catches up with the network
You decided to run your own full node. Downloaded Geth, configured the hardware, started it up. And here's the question: Ethereum has been running since 2015, and there are already **20+ million blocks** in the chain. How do you get the current state? Replaying all transactions from genesis would take **weeks**. There must be a faster way.
**State sync** is the process by which a new node obtains the current network state. Several strategies exist, each with different trade-offs between speed, security, and data volume.
**Snap Sync** (default in Geth since 2021) is the main method for new nodes. Instead of replaying all of history, the node downloads a **snapshot** of the current state and verifies its integrity through the state root in the block header. The state root is the Merkle root of the entire state trie, and if even one byte of the snapshot is wrong, the root won't match.
Analogy: a new accountant at a company
The difference between synchronization strategies
Imagine: you've been hired as an accountant at a company that has been operating for 10 years. Full Sync = re-verify ALL transactions for the past 10 years. Reliable, but will take months. Snap Sync = receive an audited balance for today and start working from the current moment. The balance is certified (state root), trustworthy. Checkpoint Sync = receive a certificate "all is OK" from an auditor and start fresh. Fastest of all, but you trust the auditor.
**Practical advice:** for a home full node, use Snap Sync (this is the default in Geth). On an NVMe SSD, synchronization takes 6-12 hours. On a SATA SSD it may take days or fail to complete due to I/O bottleneck. SATA HDD - don't even try. Execution Layer (Geth/Reth) + Consensus Layer (Lighthouse/Prysm) with Checkpoint Sync - the optimal combination to get started.
To fully participate in the Ethereum network, you need to download and verify all history from the genesis block - otherwise you can't be sure of the current state's correctness
Snap Sync allows you to obtain and cryptographically verify the current state without replaying all of history. The state root in the block header, signed by hundreds of validators, guarantees the snapshot's integrity. A full sync from genesis is only needed for archive nodes or paranoid verification - for an ordinary full node it's excessive.
This misconception keeps people from running their own nodes: "I need 15 TB and two weeks - I'd better use Infura." In reality, you can launch a full node in 6-12 hours on an ordinary computer with a 2 TB SSD. Snap Sync is designed so that cryptographic integrity guarantees are no weaker than full sync - the only difference is access to historical states.
Snap Sync allows a new Ethereum node to sync in hours instead of weeks. How does the node verify that the downloaded state snapshot is correct?
Key ideas
- **Full node** stores the current state (~500 GB) and validates every block independently - this is maximum sovereignty: you don't trust, you verify. Remember the question from the start of the lesson: who computed your balance? A full node answers: "I did, by verifying every block"
- **Light client** stores only headers (~60 MB) and verifies data through Merkle proofs - a pragmatic compromise for devices where a full node is not feasible. Helios converts trust in Infura into trust in 512 validators with collateral
- **Archive node** stores all historical states (~15+ TB) - the blockchain's "time machine" for analytics, debugging, and forensics. Most developers access archive nodes through RPC providers
- **Snap Sync** lets you launch a full node in 6-12 hours instead of weeks, verifying the state snapshot through the Merkle root - which is why the excuse "I need 15 TB and two weeks" no longer holds. Running your own node is easier than it seems, and the question "who computed my balance" is worth answering
Related topics
Node types form the infrastructure layer of the blockchain, connecting the network layer to data access:
- P2P networks and gossip protocols — Full, light, and archive nodes use the P2P network to discover peers, receive blocks, and exchange data - without the P2P layer no node can function
- Ethereum accounts — Account balances and states are stored in the state trie, which all node types work with - full nodes store the current state, archive nodes store all historical states, light clients verify through Merkle proofs
- Hash-based data structures — Merkle tries are the foundation of data storage in nodes: state trie, storage trie, transactions trie. Merkle proofs allow light clients to verify data without downloading the entire tree
- Indexing and querying data — Archive nodes provide 'raw' data, while indexing services (The Graph, Dune) structure it for fast queries - the next step from storing to using data
Вопросы для размышления
- More than 80% of dApps connect to Ethereum through Alchemy or Infura. If one of these providers goes down, what happens to 'decentralized' applications? Is dependence on RPC providers a hidden form of centralization?
- Helios (light client) trusts a sync committee of 512 validators. A full node trusts no one. What level of trust is acceptable for an ordinary user sending 100 USDC? What about a protocol with $1B TVL?
- Snap Sync relies on the state root in the block header. If an attacker controlled >2/3 of validators and signed a header with a fake state root, snap sync would accept the fake state. How does this relate to economic security and the cost of attacking Proof of Stake?