Blockchain
Hash Functions: The Foundation of Blockchain
Цели урока
- Understand how SHA-256 works and its application in blockchain
- Explain collision resistance and why SHA-256 collisions are practically unfindable
- Distinguish hashing from encryption as two different tools
- Understand the avalanche effect and its role in Proof of Work
SHA-256 - 32 bytes of output. Bitcoin: 1 hash computed in ~20 microseconds. The whole world runs 600 exahashes per second searching for a single block. Every second. That is more than all computers from the year 2000 combined. Behind that power - one mathematical function.
- **Bitcoin mining** - 600 exahashes of SHA-256 every second searching for a nonce for one block every ~10 minutes
- **Git** - every commit is identified by a SHA-1 hash (migrating to SHA-256). `git log` shows these hashes
- **HTTPS** - Certificate Transparency logs: hashes of all TLS certificates are publicly verified through Merkle trees
- **Digital signatures** - ECDSA and Ed25519 sign the hash of a message, not the message itself: SHA-256 first, then the signature
- **Integrity verification** - Linux ISO files ship with SHA-256 checksums: changing one byte produces a completely different hash
- **Passwords** - sites store Argon2id(password), not the password. Raw SHA-256 is insecure for passwords: a GPU computes 10^10 SHA-256/s
Ralph Merkle, NSA and the history of SHA
1979. Ralph Merkle publishes the Merkle-Damgard construction - the foundation of most hash functions including SHA-256. He also describes Merkle trees - structures that 30 years later become the foundation of Bitcoin. 1993: NSA develops SHA-0, published through NIST. A vulnerability is found quickly. 1995 - SHA-1. 2001 - SHA-2 (including SHA-256). 2012 - NIST open competition, winner Keccak, standard SHA-3. 2017 - Google SHAttered: first SHA-1 collision achieved with 6500 CPU-years and 110 GPU-years of compute, cost ~USD 110,000. Bitcoin relies on SHA-256 - if SHA-256 were broken, the entire USD 1.5 trillion cryptocurrency market would be at risk.
Предварительные знания
SHA-256: Digital Fingerprint
SHA-256 - 32 bytes of output. Bitcoin: one hash computed in ~20 microseconds. The whole world runs 600 exahashes per second searching for a single block. Every second. That is more than all computers from the year 2000 combined. A **hash function** turns data of **any size** into a string of **fixed length**. The word 'hello' or the full text of 'War and Peace' - the output is always **64 hexadecimal characters** (256 bits).
**Why 256 bits?** $2^{256}$ possible hashes. That is $\approx 10^{77}$ - exceeding the estimated number of atoms in the observable universe ($\approx 10^{80}$). The probability of a random hash collision is effectively zero.
SHA-256 is used everywhere: **Bitcoin** computes the hash of every block (twice - SHA-256d), **Git** identifies commits by hash (migrating from SHA-1 to SHA-256), **HTTPS** verifies certificate integrity. One function - three different systems, one mathematical foundation.
What happens when a 10 GB file is passed to SHA-256?
Collision Resistance
A **collision** is two different inputs with the same hash. Mathematically, collisions are inevitable (infinite inputs, finite hash space - the pigeonhole principle). But a good hash function makes them **practically impossible** to find. **Collision resistance** means: no one can find m1 != m2 such that H(m1) = H(m2) in any reasonable time.
The birthday paradox
With 23 people in a room, the probability of a shared birthday exceeds 50%. Similarly for hashes: finding a collision in an n-bit hash requires roughly 2^(n/2) attempts. For SHA-256: 2^128 ~= 3.4 * 10^38 attempts. A supercomputer computing 10^18 hashes/sec would search for 10^13 years. The age of the universe is 1.4 * 10^10 years.
**Not all hash functions are secure:** - **MD5** - broken in 2004, collisions in seconds on a standard CPU - **SHA-1** - Google SHAttered 2017: first collision, 6500 CPU-years + 110 GPU-years, cost ~USD 110,000 - **SHA-256** - not broken as of today - **SHA-3 (Keccak)** - NIST 2012 competition winner, backup standard
Why does collision resistance matter in blockchain? An attacker could craft a malicious transaction with the same hash as a legitimate one. The substitution would be undetectable - verification would pass. Collision resistance closes this attack vector.
SHA-256 collisions mathematically exist. Why is this not a security problem?
Preimage Resistance
**Preimage resistance** - given a hash h, it is impossible to find m such that H(m) = h. **A hash function is a one-way street**: computing a hash from data is instant. Recovering data from a hash is computationally impossible. Operations required: $2^{256}$ for SHA-256. Every computer ever built working for the entire life of the universe would not be enough.
The meat grinder analogy
A meat grinder turns a piece of meat into mince. The original piece cannot be recovered from the mince. SHA-256 does the same with data: hello -> 2cf24dba5fb0a30e26e83b2ac5b9e29e... Knowing 2cf24dba..., it is mathematically impossible to recover the word "hello" - only brute force works.
Brute force works for short passwords. 'password123' is in rainbow tables. That is why password storage requires not just SHA-256 but **bcrypt / Argon2** - functions deliberately designed to be slow. SHA-256 takes ~20 microseconds, bcrypt takes ~100 ms. A factor of 5000 makes GPU attacks impractical.
**Password storage - right and wrong:** Wrong: storing SHA-256(password) Why: a GPU computes 10^10 SHA-256/s. Brute-forcing an 8-character password takes minutes. Right: storing bcrypt(password, cost=12) or Argon2id(password) Why: bcrypt is deliberately slow - 100 ms/hash. Brute-forcing an 8-character password takes years.
Hash: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8. Can the original message be recovered?
Avalanche Effect
The **avalanche effect** - changing one bit of the input **completely changes** the output hash. The hashes of 'hello' and 'hellp' look entirely unrelated. There is no correlation - no way to 'approach' a target hash gradually.
One changed letter - ~50% of output bits changed. This is not a coincidence - it is a cryptographic requirement. An ideal hash function: every output bit depends on every input bit. SHA-256 is close to this ideal.
Proof of Work and the avalanche effect
A miner searches for a nonce where the block hash starts with N zeros: nonce=0 -> hash: 8a3f7b2c... (not valid) nonce=1 -> hash: c91e0d4f... (not valid) nonce=834571 -> hash: 000000a8f3... (valid!) The avalanche effect makes predicting the right nonce impossible - only brute force works. The entire world runs 600 exahashes/s for one block every ~10 minutes.
**Ideal hash function criterion:** changing 1 input bit causes each output bit to change with probability 50%. This is called the Strict Avalanche Criterion (SAC). SHA-256 satisfies SAC statistically.
Hashing and encryption are the same thing
Encryption is reversible (a key exists for decryption). Hashing is irreversible (data cannot be recovered).
Encryption protects confidentiality: data can be read with the key. Hashing ensures integrity: data cannot be recovered. Encryption is a lock with a key. Hashing is a meat grinder. In blockchain both tools are used together but for different purposes.
The hash of 'cat' = 77af778b... What will the hash of 'cats' look like?
Key ideas
- **SHA-256** - any input -> 256 bits (64 hex characters). 600 exahashes/s in Bitcoin mining. Ralph Merkle 1979 + NSA 1993 = foundation of all blockchain
- **Collision resistance** - finding two inputs with the same hash: 2^128 operations, 10^13 years. Google SHAttered broke SHA-1 for USD 110,000 - SHA-256 is intact
- **Preimage resistance** - data cannot be recovered from a hash. Passwords need bcrypt/Argon2id, not raw SHA-256: a GPU computes 10^10 SHA-256/s
- **Avalanche effect** - 1 changed input bit changes ~50% of output. Without avalanche, Proof of Work loses meaning: miners could 'approach' a target hash
- **Hashing != encryption:** meat grinder vs lock with a key. Both are used in blockchain for different purposes
Related topics
Hash functions are the foundation of many technologies in blockchain and beyond:
- Merkle Trees — Use hashes to build a transaction verification tree - the next lesson
- Proof of Work — Miners search for a nonce so the block hash starts with N zeros - the avalanche effect in action
- Digital Signatures — ECDSA: the message hash is signed with a private key
Вопросы для размышления
- If SHA-256 were ever broken, what would happen to Bitcoin? Which parts of the system would be affected first?
- Why is raw SHA-256 insufficient for password storage (hint: rainbow tables and GPU speed)?
- Do quantum computers threaten SHA-256? Grover's algorithm halves effective security - from 256 bits to 128. Is 128 bits enough?
Связанные уроки
- bc-03-merkle — Merkle trees are built on top of hash functions - the next lesson
- bc-14-pow — Proof of Work - finding a nonce through billions of SHA-256 calls
- bc-06-digital-signatures — Digital signatures: a message hash is signed with a private key
- crypto-19-hash-functions — Deep dive into hash function constructions in cryptography
- crypto-22-password-hashing — bcrypt/Argon2 - password hashing with GPU-attack protection
- alg-25-rabin-karp — Rolling hash in string algorithms - the same idea of a deterministic fingerprint
- crypto-20-sha-family