Programming Language Theory

Static Program Analysis

Facebook Infer runs on every pull request at Facebook, Instagram, WhatsApp, and Messenger. 75% of the bugs it reports are confirmed by developers as real. That is a 75% signal at the scale of billions of lines of code.

  • **Airbus A380**: Astree (an abstract interpreter) verified the flight control software. Zero false positives on 130K lines of C, a practically unheard-of result.
  • **Facebook Infer**: 75% precision when analyzing PRs. Catches resource leaks and null dereferences before code review at some of the largest mobile apps in the world.
  • **Clang Static Analyzer**: built into Xcode. Apple uses it to find memory leaks in Objective-C and Swift. Every macOS/iOS developer relies on it daily.

Abstract Interpretation

Abstract interpretation is the mathematical foundation of static analysis. Instead of concrete values (x=42), it uses abstract domains (x > 0, x in [1..100]). The analysis is sound: if it does not warn, the error is not there. The price: false positives.

Astree is an abstract interpreter for C. It verified the Airbus A380 primary flight control software (130K lines of C) with zero false positives. A real-world application of abstract interpretation.

Why does static analysis based on abstract interpretation produce false positives?

Dataflow Analysis

Dataflow analysis computes information about data at every program point. Forward analysis (what is true on entry to a basic block) and backward analysis (what is needed on exit). Iterative computation runs to a fixed point over the CFG.

Why does live variable analysis use a backward traversal of the CFG?

Pointer Analysis

Pointer analysis (alias analysis) determines whether two pointers can point to the same memory. Critical for optimizations. Andersen-style analysis (flow-insensitive, context-insensitive) is the baseline algorithm, with O(n^3) complexity.

Why is alias analysis critical for SIMD vectorization?

Linters and Static Analyzers

Industrial static analyzers: Clang Static Analyzer (Cocoa, C++), Infer (Facebook, Java/C/Obj-C), Semgrep (multilingual). They apply limited forms of dataflow and pointer analysis to find real bugs with an acceptable false-positive rate.

Static analysis replaces testing

Static analysis and testing complement each other. Analysis finds classes of bugs (null deref, resource leak) across all paths. Tests check specific behavior.

Infer can find a null dereference that only triggers in a specific sequence of 10 calls. Those paths are hard to cover with tests. But tests verify business logic that the analysis cannot understand.

How does Semgrep differ from traditional static analyzers?

Summary

  • **Abstract interpretation**: abstract domains in place of concrete values. Sound (no false negatives) but with false positives.
  • **Dataflow analysis**: forward (reaching definitions) and backward (live variables). Iterate to a fixed point over the CFG.
  • **Pointer/alias analysis**: can two pointers refer to the same cell? Critical for SIMD optimizations and restrict.
  • **Linters**: Infer (bi-abduction, heap), Semgrep (AST patterns), Clang SA. They complement tests, not replace them.

Related topics

Static analysis applies theory to practice:

  • IR and optimizations — Alias analysis and dataflow drive compiler optimizations.
  • Property-based testing — Static analysis and property testing are complementary approaches to verification.

Вопросы для размышления

  • Abstract interpretation is sound but not complete. Astree avoids false negatives. Why is complete analysis (no false positives) impossible by Rice's theorem?
  • Rust's borrow checker is a form of static analysis with zero runtime overhead. How does it avoid false positives when checking lifetimes?
  • Facebook Infer runs on every PR. How does the company address developers who ignore analyzer warnings?

Связанные уроки

  • alg-12-bfs
Static Program Analysis

0

1

Sign In