Logic in Verification and AI

In 2024, DeepMind's AlphaProof earns an IMO silver medal using formal proofs in Lean. That same year, Boeing certifies an autopilot through formal methods. And every `npm install` runs a SAT solver - the same technology Intel uses to verify processor designs.

npm/yarn dependency resolution - SAT under the hood of every `npm install`
AWS S2N TLS: Z3 found real bugs before production deployment
AlphaProof (DeepMind 2024) - IMO silver medal via Lean + LLM

SAT Solvers: NP-completeness in Production

SAT (Boolean Satisfiability) is the first NP-complete problem (Cook 1971): given a formula in CNF, does an assignment of variables exist that makes it true? NP-completeness seemed to doom SAT to impracticality. **DPLL** (Davis-Putnam-Logemann-Loveland, 1962) changed that picture: backtracking search augmented with two simplification rules.

**Unit propagation:** when a clause contains exactly one unassigned literal, its value is forced. **Pure literal elimination:** when a literal appears with only one polarity, the assignment is immediate. These two rules drastically shrink the search space before any branching occurs.

**MiniSAT** (Een & Sorenson, 2003) - 600 lines of C++, rewrote the SAT competition landscape. It showed that a clean CDCL implementation outperforms elaborate heuristics. Kissat (2020, Biere et al.) is the modern SAT Race champion and a direct descendant of MiniSAT.

**ML application:** npm/yarn dependency resolution is a SAT problem. Every `npm install` runs a MiniSAT-style solver: package dependency constraints are encoded as clauses, the solver finds a compatible version assignment. Intel uses SAT for chip design verification (Cadence Formal). AWS uses SAT in hardware synthesis tools.

What does CDCL improve over DPLL?

SMT: SAT Plus Arithmetic

SAT operates on Boolean variables. Real programs involve integers, arrays, and pointers. **SMT** (Satisfiability Modulo Theories) extends SAT with background theories: linear arithmetic, bit-vectors, arrays, uninterpreted functions. The formula $x + y > 5 \land x < 3 \land y < 3$ is a typical SMT query.

**DPLL(T) architecture:** the SAT core works with abstract Boolean variables representing atomic theory formulas. At each partial assignment, a theory solver (T-solver) checks T-consistency. On a T-conflict, the T-solver generates an explanation - a new clause for the SAT core. The SAT layer handles combinatorics; the T-solver handles the theory.

**Z3** (Leonardo de Moura & Nikolaj Bjorner, Microsoft Research, 2008) is the industry standard. Open source with Python/C++/Java/C# bindings. Multiple SMT-COMP champion. **CVC5** (Stanford/Iowa/NYU) is the main competitor, particularly strong on string theories.

**ML application:** AWS formally verified the S2N TLS library using Z3 - the solver found real bugs before production deployment. The Ethereum Certora Prover uses SMT for smart contract verification (billions of dollars protected by formal methods). Python mypy uses SMT-like reasoning in type narrowing.

What is Z3?

Automated Theorem Proving

SMT decides satisfiability. Sometimes the goal is to prove a theorem - to construct a formal proof checkable by a machine. **Interactive theorem provers** (proof assistants): Lean 4, Coq, Isabelle/HOL. A mathematician writes the proof; the system verifies every step. A proof and a program are the same artifact.

**Curry-Howard correspondence:** proof = program, proposition = type. When the type `A -> B` is inhabited (a program of that type exists), B is provable from A. This is not an analogy - it is a literal isomorphism. In Coq, a single expression is simultaneously a function and a proof. In Lean 4: `theorem add_comm (n m : Nat) : n + m = m + n`

**AlphaProof** (DeepMind, 2024): an LLM generates proof candidates in Lean 4; the formal verifier rejects incorrect ones. Result: silver medal at IMO 2024 - four of six problems solved. The first automated system to reach olympiad level with formal guarantees.

**ML application:** Lean Copilot (Stanford) is GitHub Copilot for formal proofs, using an LLM to suggest proof tactics interactively. Microsoft verifies Windows device drivers with Dafny (a language with a built-in SMT verifier). The Mathlib project contains 100,000+ formally verified theorems in Lean 4.

What does the Curry-Howard correspondence state?

Neural Network Verification

Formal methods verify programs against a specification. **Neural network verification** is a new frontier: checking properties such as $\epsilon$-robustness rigorously. Given a network $f$, input domain $D$, and property $P$ - the question is whether $P(f(x))$ holds for all $x \in D$.

**Reluplex** (Katz et al., 2017) is the first SMT solver targeting networks with ReLU activations. It extends the simplex method with a ReLU-handling rule. Applied to ACAS Xu (airborne collision avoidance): Reluplex found a real safety property violation in one of the 45 advisory neural networks.

**Marabou** (NeuralNetworkVerification.com) is the open-source successor to Reluplex with support for convolutional networks. alpha-beta-CROWN is the current champion (VNN-COMP 2021-2023), combining branch-and-bound with LP relaxations. A certification gap remains between what can be verified and what is actually deployed in safety-critical systems.

Formal verification guarantees that **the model matches the specification** - not that the specification itself is correct. ACAS Xu: a safety property violation was found, but the property was written by engineers and may not have covered all real scenarios. Verification shifts risk from implementation to specification.

**ML application:** the Boeing 787 autopilot is certified using formal methods (DO-178C Level A - the highest software criticality level). Tesla FSD uses bounded model checking for safety properties. The FAA requires formal verification for DO-178C Level A software. ISO 26262 (automotive functional safety) requires formal analysis for ASIL-D systems.

Formal verification guarantees that a system works correctly

Formal verification guarantees that a system matches its specification - but the specification itself may be incomplete or wrong

ACAS Xu: Reluplex found violations of formally stated safety properties, but those properties were written by engineers and did not cover all real scenarios. The specification is itself an artifact that requires scrutiny.

What does Reluplex verify?

Key Ideas

SAT is NP-complete, but CDCL (non-chronological backtracking + clause learning) makes it tractable in practice for instances with millions of variables
SMT extends SAT with theories for arithmetic and data structures: DPLL(T) = SAT core + T-solver. Z3 is the industry standard
Curry-Howard correspondence: proof = program, proposition = type. AlphaProof (2024) applies LLM + Lean to reach IMO-level theorem proving
NN verification (Reluplex, Marabou) checks safety/robustness formally, but verifies conformance to a specification - not the correctness of the specification itself

Вопросы для размышления

SAT is NP-complete, yet CDCL solvers handle industrial instances with millions of variables. What does this reveal about the structure of practical SAT instances compared to worst-case inputs?
Formal verification of the Boeing 787 guarantees conformance to DO-178C specifications. Who verifies the specification itself, and how?
AlphaProof generates proof candidates in Lean that are then checked by a formal verifier. How does this differ fundamentally from generating proofs without formal checking?

Связанные уроки

ml-12 — Temporal logic LTL/CTL from modal logic is the foundation of model checking in program verification
dm-01 — Discrete mathematics underpins SAT and the graph algorithms inside CDCL
lt-01-pac-intro — Formal verification of ML models uses PAC-style guarantees as specifications
it-01 — Information-theoretic reasoning runs parallel to logical proof construction
prob-04-bayes — Probabilistic models are verified via probabilistic SMT extensions
fl-33-formal-verification