Blockchain

Fuzzing and Smart Contract Testing

A developer writes 200 unit tests, all green. Auditors review the contract and find a couple of minor issues. Deploy to mainnet. Three weeks later, a hacker finds a sequence of four calls that nobody thought of and drains $197 million. This is not a hypothetical scenario - it is Euler Finance, March 2023. Unit tests check scenarios the developer thought of. Fuzzing checks scenarios nobody thought of. It generates thousands of random inputs and call sequences, trying to break the contract's invariants. A single fuzzer run can find a bug that would take a human years to discover.

**Trail of Bits (Echidna)** discovered critical bugs in Compound, MakerDAO, and dozens of DeFi protocols through property-based fuzzing. One bug in Compound allowed infinite cToken minting - Echidna found the totalSupply invariant violation in 12 seconds, which manual auditing had missed
**Paradigm (Foundry)** made fuzzing accessible to every Solidity developer. Uniswap V4, OpenSea Seaport, Optimism - all use Foundry fuzz and invariant tests as a mandatory part of CI/CD. Seaport's fuzz tests found an edge case in partial fill handling that had slipped past three audit teams
**Euler Finance (March 2023, $197M)** had 100% line coverage and passed six audits. The bug was in the combination of donateToReserves() + self-liquidation, which unit tests did not cover. After the incident, the team introduced invariant tests verifying solvency after arbitrary call sequences

Предварительные знания

Reentrancy and Classic Attacks

Echidna: property-based fuzzing for smart contracts

Imagine hiring a thousand testers, each endlessly trying different inputs against your contract to break its invariants. One enters zero, another the maximum uint256, a third passes a contract address instead of an EOA. That is exactly how **fuzzing** works: a generator automatically creates a huge number of random inputs and checks that certain properties are never violated. **Echidna** is a property-based fuzzer from Trail of Bits, written in Haskell, that specializes in Solidity smart contracts.

Echidna does not just generate random bytes. It understands the contract's ABI and generates **valid sequences of function calls** with typed arguments. If a contract has `deposit(uint256)` and `withdraw(uint256)`, Echidna generates chains like: `deposit(42)` → `withdraw(10)` → `deposit(0)` → `withdraw(type(uint256).max)`, searching for a sequence that violates the specified property. When a violation is found, **shrinking** kicks in - reducing the counterexample to the minimal reproducible scenario.

**Corpus management** is one of Echidna's key strengths. The `corpusDir` parameter saves transactions that increased code coverage. On subsequent runs, Echidna starts from these saved sequences, significantly speeding up the discovery of deeply hidden bugs. Teams use the corpus in CI: each run adds to the database, and over time the fuzzer "learns" your contract.

In Echidna, a property for testing is defined as a function with the echidna_ prefix that returns bool. Which result causes Echidna to consider a bug found?

Foundry Fuzz: fuzzing in the Forge ecosystem

Echidna is a powerful specialized tool, but it requires a separate installation and configuration. **Foundry** (Forge) integrates fuzzing directly into the standard test framework: any test function that accepts parameters will automatically have random values substituted by Forge. This makes fuzzing accessible with no barrier to entry - you just write a test with arguments instead of hardcoded values.

Foundry follows **convention over configuration**: if a test function accepts parameters, Forge automatically switches to fuzz mode. The function name starts with `testFuzz_` (or just `test_` with parameters). Forge generates random values for each parameter and runs the test a specified number of times (default 256). To filter invalid inputs, use `vm.assume(condition)` - if condition is false, that run is discarded and a new set of inputs is generated.

A fuzz test in Foundry contains `vm.assume(amountA > 0 && amountA < 100)`. When running with runs = 10000, Forge reports: "Too many rejected inputs (max_test_rejects reached)". What is the best fix?

Invariant Testing: stateful fuzzing

A regular fuzz test checks one function with different inputs. But bugs in smart contracts often arise from **sequences** of calls: deposit → borrow → price change → liquidate. **Invariant testing** (stateful fuzzing) is the next level: the fuzzer generates not individual inputs, but **chains of calls** to different functions and after each chain checks that the contract's global invariants have not been violated.

In Foundry, invariant tests use **handler contracts** - wrappers that call functions of the target contract on behalf of different users with prepared data. A handler contract solves the problem of "blind" fuzzing: instead of passing a random address (which has no balance), the handler first calls `deal()` or `mint()`, then calls the target function. `targetContract()` tells Forge which contracts to fuzz. `targetSelector()` restricts the set of functions.

**Ghost variables** are a critical pattern for invariant testing. Without them it is impossible to verify accounting invariants: the contract stores only the current state (totalDeposits), not the history (total ever deposited and withdrawn). Ghost variables maintain independent tracking in the handler contract and allow detecting discrepancies between expected and actual state.

In a lending protocol invariant test, the handler contract has ghost_totalDeposited and ghost_totalWithdrawn. Why are these ghost variables needed when the contract already has totalDeposits?

Coverage: code metrics and testing strategy

You have written 200 unit tests, 50 fuzz tests, and 10 invariant tests. All green. Is the contract safe? **No** - not if you don't know what portion of the code those tests actually exercise. **Code coverage** is a metric showing what percentage of the code was executed during testing. Forge provides a report across four metrics: line coverage, branch coverage (if/else branches), function coverage, and statement coverage.

**100% coverage does not guarantee safety.** Euler Finance had 100% line coverage and passed multiple audits before being exploited for $197M in March 2023. The bug was in the business logic: donateToReserves() allowed destroying collateral without liquidating the debt. Tests covered every line but did not check the **combination** of calls that violated the solvency invariant. This is exactly why invariant tests are critical: unit tests verify individual functions, while invariant tests verify system properties after arbitrary call sequences.

**Mutation testing** is a technique that checks the quality of the tests themselves. A tool (for example, vertigo-rs for Solidity or gambit by Certora) automatically introduces small changes to the contract code - **mutations**: replacing `>` with `>=`, `+` with `-`, removing lines. Then it runs the tests. If the tests still pass on the mutated code - they do not catch that class of error. **Mutation score** is the percentage of mutants that the tests killed. High coverage with a low mutation score means the tests execute code but do not verify the results.

100% code coverage means the contract is fully tested and safe to deploy to production

Coverage only shows which code was executed, not that it was verified for correctness. Euler Finance had 100% coverage and passed multiple audits, yet lost $197M due to a business logic bug that individual unit tests could not catch - only invariant testing or formal verification could have detected the dangerous combination of calls

Unit tests check individual functions in isolation with specific inputs. But smart contracts live in an environment where anyone can call any public function in any order. The Euler bug was in a sequence: donateToReserves() destroyed collateral, after which self-liquidation created bad debt. Each function worked correctly in isolation - the problem was in their combination. This is exactly why the testing pyramid has five levels: static analysis, unit tests, fuzz tests, invariant tests, and formal verification. No single level replaces the others.

A project has 100% line coverage and 95% branch coverage. All 300 unit tests pass. Mutation testing shows a mutation score of 40%. What does this mean?

Key takeaways

**Echidna** is a property-based fuzzer from Trail of Bits. It generates typed call sequences, checks properties via `echidna_*` functions (return false = bug found), automatically shrinks counterexamples, and accumulates a corpus to increase coverage
**Foundry fuzz** integrates fuzzing into the standard test framework: a test with parameters automatically becomes a fuzz test. Use `bound()` for numeric ranges, `vm.assume()` for logical filters, and differential testing to compare implementations
**Invariant testing** (stateful fuzzing) verifies global system properties after random call sequences. Handler contracts prepare valid inputs, ghost variables maintain independent tracking for verifying accounting invariants
**Coverage** is a code coverage metric (line, branch, function, statement). `forge coverage` generates the report. Mutation testing checks the quality of the tests themselves: a surviving mutant means the tests do not catch that class of error
Euler Finance lost $197M with 100% coverage and six audits - unit tests did not find the dangerous call combination. The security pyramid (static analysis → unit → fuzz → invariant → formal verification) is the only path to real protection, because each level catches a class of bugs that the others miss

Вопросы для размышления

Echidna generates random call sequences, but real attacks often require a precise sequence involving flash loans, oracle manipulation, and reentrancy. How would you augment standard fuzzing to increase the chances of finding such complex attacks? Which properties should be formulated first?
Mutation testing shows a mutation score of 40% with 100% line coverage. This means 60% of code changes are not detected by tests. How would you prioritize fixing this: by function criticality (withdraw > view) or by mutation type (arithmetic > logic)? What mutation score would you consider sufficient for a mainnet deployment?
Invariant tests with handler contracts are more complex to write and maintain than unit tests. For a small project (1 contract, 5 functions), is it worth investing time in invariant tests, or are unit + fuzz tests sufficient? Where is the threshold after which invariant tests become mandatory?

Связанные уроки

sec-05

Blockchain

Fuzzing and Smart Contract Testing

**Trail of Bits (Echidna)** discovered critical bugs in Compound, MakerDAO, and dozens of DeFi protocols through property-based fuzzing. One bug in Compound allowed infinite cToken minting - Echidna found the totalSupply invariant violation in 12 seconds, which manual auditing had missed
**Paradigm (Foundry)** made fuzzing accessible to every Solidity developer. Uniswap V4, OpenSea Seaport, Optimism - all use Foundry fuzz and invariant tests as a mandatory part of CI/CD. Seaport's fuzz tests found an edge case in partial fill handling that had slipped past three audit teams
**Euler Finance (March 2023, $197M)** had 100% line coverage and passed six audits. The bug was in the combination of donateToReserves() + self-liquidation, which unit tests did not cover. After the incident, the team introduced invariant tests verifying solvency after arbitrary call sequences

Предварительные знания

Reentrancy and Classic Attacks

Echidna: property-based fuzzing for smart contracts

In Echidna, a property for testing is defined as a function with the echidna_ prefix that returns bool. Which result causes Echidna to consider a bug found?

Foundry Fuzz: fuzzing in the Forge ecosystem

A fuzz test in Foundry contains `vm.assume(amountA > 0 && amountA < 100)`. When running with runs = 10000, Forge reports: "Too many rejected inputs (max_test_rejects reached)". What is the best fix?

Invariant Testing: stateful fuzzing

In a lending protocol invariant test, the handler contract has ghost_totalDeposited and ghost_totalWithdrawn. Why are these ghost variables needed when the contract already has totalDeposits?

Coverage: code metrics and testing strategy

100% code coverage means the contract is fully tested and safe to deploy to production

A project has 100% line coverage and 95% branch coverage. All 300 unit tests pass. Mutation testing shows a mutation score of 40%. What does this mean?

Key takeaways

**Echidna** is a property-based fuzzer from Trail of Bits. It generates typed call sequences, checks properties via `echidna_*` functions (return false = bug found), automatically shrinks counterexamples, and accumulates a corpus to increase coverage
**Foundry fuzz** integrates fuzzing into the standard test framework: a test with parameters automatically becomes a fuzz test. Use `bound()` for numeric ranges, `vm.assume()` for logical filters, and differential testing to compare implementations
**Invariant testing** (stateful fuzzing) verifies global system properties after random call sequences. Handler contracts prepare valid inputs, ghost variables maintain independent tracking for verifying accounting invariants
**Coverage** is a code coverage metric (line, branch, function, statement). `forge coverage` generates the report. Mutation testing checks the quality of the tests themselves: a surviving mutant means the tests do not catch that class of error
Euler Finance lost $197M with 100% coverage and six audits - unit tests did not find the dangerous call combination. The security pyramid (static analysis → unit → fuzz → invariant → formal verification) is the only path to real protection, because each level catches a class of bugs that the others miss

Вопросы для размышления

Echidna generates random call sequences, but real attacks often require a precise sequence involving flash loans, oracle manipulation, and reentrancy. How would you augment standard fuzzing to increase the chances of finding such complex attacks? Which properties should be formulated first?
Mutation testing shows a mutation score of 40% with 100% line coverage. This means 60% of code changes are not detected by tests. How would you prioritize fixing this: by function criticality (withdraw > view) or by mutation type (arithmetic > logic)? What mutation score would you consider sufficient for a mainnet deployment?
Invariant tests with handler contracts are more complex to write and maintain than unit tests. For a small project (1 contract, 5 functions), is it worth investing time in invariant tests, or are unit + fuzz tests sufficient? Where is the threshold after which invariant tests become mandatory?

Связанные уроки

sec-05

Fuzzing and Smart Contract Testing

Предварительные знания

Echidna: property-based fuzzing for smart contracts

Foundry Fuzz: fuzzing in the Forge ecosystem

Invariant Testing: stateful fuzzing

Coverage: code metrics and testing strategy

Key takeaways

Related topics

Вопросы для размышления

Связанные уроки

Fuzzing and Smart Contract Testing

Предварительные знания

Echidna: property-based fuzzing for smart contracts

Foundry Fuzz: fuzzing in the Forge ecosystem

Invariant Testing: stateful fuzzing

Coverage: code metrics and testing strategy

Key takeaways

Related topics

Вопросы для размышления

Связанные уроки