Programming Language Theory

Model Checking

Amazon caught a DynamoDB bug that could have caused data loss using a TLA+ specification. That bug would not have surfaced through thousands of unit tests or code review. Model checking is the only way to verify concurrent systems with a proof.

  • **Amazon AWS**: TLA+ used to verify DynamoDB, S3, EBS. Found 16 critical bugs, including potential data loss in DynamoDB replication.
  • **Microsoft Azure**: model checking for the Cosmos DB consensus protocol (a Paxos variant) with a formal correctness guarantee.
  • **Intel**: SPIN and model checking for CPU cache coherency protocols. Silicon bugs cannot be fixed after manufacturing.

Temporal Logic

Temporal logic extends classical logic with time operators: G (globally / always), F (finally / eventually), X (next state), U (until). It is a language for formally describing system properties.

Safety and liveness are complementary classes. A deadlock violates both safety (G(!deadlock)) and liveness (G(F(progress))). TLA+ separates invariants (safety) from temporal properties (liveness).

What does the formula G(request -> F(response)) mean in temporal logic?

CTL: Computation Tree Logic

CTL (Computation Tree Logic) is a branching-time temporal logic. The quantifiers A (all paths) and E (exists a path) combine with temporal operators: AG, AF, AX, EU, AU, EG, EF, EX. A model checker verifies CTL formulas in O(|M| * |phi|).

What is the difference between AGp and EGp?

LTL and SPIN

LTL (Linear Temporal Logic) is a linear-time temporal logic that reasons over a single trace (sequence of states). SPIN is a model checker for LTL that uses the Promela language to describe concurrent systems.

What happens when SPIN finds a violation of an LTL property?

TLA+

TLA+ (Temporal Logic of Actions) is a language for formally specifying and verifying distributed systems. It was created by Leslie Lamport (the author of Paxos). AWS, Microsoft, and Intel use TLA+ to verify critical algorithms.

Amazon used TLA+ to verify S3, DynamoDB, and EBS. From the AWS blog: 'We have used TLA+ to find 16 bugs including a bug in DynamoDB replication that could cause data loss.' Seven out of those ten bugs would not have surfaced in code review.

Formal verification is too complex and academic for real projects

AWS, Microsoft Azure, Intel CPU verification, and the Curiosity Mars Rover all use formal methods. TLA+ found bugs in DynamoDB, which handles 10 trillion requests per day.

Formal methods are expensive for 100% of code, but invaluable for concurrency algorithms, consensus protocols, and cryptography. The cost of a DynamoDB bug far exceeds the cost of a TLA+ specification.

What does count' mean in TLA+?

Summary

  • **Temporal logic**: G (always), F (eventually), X (next), U (until). Two classes: safety (nothing bad happens) and liveness (something good eventually happens).
  • **CTL**: A (all paths) and E (exists a path). AGp is an invariant on all paths. Model checking runs in O(|M|*|phi|).
  • **SPIN + Promela**: concurrent systems plus LTL properties produce a counterexample with a concrete error scenario.
  • **TLA+**: a specification language for distributed systems. AWS uses it for DynamoDB and S3. It catches bugs that code review misses.

Related topics

Model checking is one piece of formal verification:

  • Property-based testing — Property testing is a lighter approach: it checks properties without exhaustive enumeration.
  • Theorem provers — Proof assistants (Coq, Isabelle) give stronger guarantees but require manual proofs.

Вопросы для размышления

  • Model checking suffers from state explosion: 10 boolean variables means 2^10 states. How does symbolic model checking (BDD, SAT) tackle this problem?
  • TLA+ found AWS bugs that code review missed. What does this say about the limits of traditional code review for distributed systems?
  • Paxos is proven correct for consensus, yet implementations like Multi-Paxos and Raft still have bugs. Why does a formally specified algorithm not guarantee a correct implementation?

Связанные уроки

  • fl-33-formal-verification
Model Checking

0

1

Sign In