Formal Languages
NP-Completeness: Reduction Proofs
1972. Richard Karp publishes a list of 21 problems, all NP-complete through reduction chains. A year earlier Stephen Cook proved SAT NP-complete. Karp showed that thousands of problems mathematicians and engineers had worked on are connected, solving one solves them all. It is the single most important result in the theory of algorithms.
- **Intel formal verification:** SAT solvers (CaDiCaL, Kissat) verify processor chips. Each new Intel Core generation is checked against the specification with thousands of SAT queries. An NP-complete problem solved in production.
- **Google OR-Tools:** a library for NP-hard optimization (vehicle routing, job scheduling, bin packing). Used in Google Maps routing, Waymo planning, and manufacturing schedules.
- **Bioinformatics (protein-protein interaction):** finding a maximum clique in protein interaction graphs. Clique is NP-complete. In practice: dedicated algorithms for biological graphs (typically sparse) and FPT algorithms.
- **Compiler register allocation:** coloring the interference graph, 3-Coloring is NP-complete. LLVM uses the Chaitin algorithm (greedy heuristic) plus spilling. It works in practice for real programs despite NP-completeness.
SAT and the Cook-Levin Theorem
**SAT** (satisfiability): given a boolean formula phi(x_1, ..., x_n), is there a variable assignment that makes phi true? In 1971, Stephen Cook proved that SAT is NP-complete: every NP problem polynomial-time reduces to SAT. So SAT is the hardest problem in NP.
**NP-completeness:** a problem X is NP-complete if (1) X is in NP and (2) for every Y in NP, Y <=_pm X (every NP problem reduces to X). A problem X is NP-hard if (2) holds without the (1) requirement. NP-hard problems may live outside NP.
SAT is NP-complete means that...
3-SAT and Reductions
**3-SAT** is SAT where every clause has exactly 3 literals. SAT <=_pm 3-SAT (poly-time reduction). 3-SAT is easier to reduce to other problems, a standard intermediate step in reduction chains. Note that 2-SAT is in P (solved in O(n+m) via SCC in the implication graph).
The boundary between 2-SAT (P) and 3-SAT (NP-complete) is one of the cleanest examples of a tiny change pushing a problem from P into NP. Similar pairs: graph reachability (P) vs Hamiltonian path (NP-complete); 2-clique, an edge (P) vs k-clique (NP-complete when k is part of the input).
2-SAT is solvable in O(n+m), 3-SAT is NP-complete. What changes as clause width grows from 2 to 3?
Classic NP-Complete Problems
Karp's list (1972): 21 NP-complete problems, including Vertex Cover, Clique, Independent Set, TSP, Hamiltonian Path, Set Cover, and Graph Coloring. All are linked by chains of poly-time reductions. Today databases catalog thousands of NP-complete problems from combinatorics, biology, physics, and logic.
| Problem | Question | Practical use |
|---|---|---|
| 3-SAT | Is a 3-CNF formula satisfiable? | SAT solvers, circuit verification |
| Vertex Cover | Is there a vertex cover of size k? | Network monitoring, network design |
| Clique | Is there a clique of size k? | Community detection, bioinformatics |
| TSP (decision) | Is there a tour of length <= L? | Logistics, routing |
| 3-Coloring | Can the graph be 3-colored? | Scheduling, register allocation |
| Knapsack | Can value >= V be reached with weight <= W? | Resource planning, finance |
| Set Cover | Can U be covered by <= k sets? | Network coverage, advertising |
**Independent Set is equivalent to Vertex Cover is equivalent to Clique** by complement: S is an independent set in G iff S is a clique in the complement of G. Vertex Cover + Independent Set = V (one is the complement of the other). That is why all three are NP-complete: the same hardness in different clothing.
Vertex Cover and Independent Set are NP-complete on general graphs. What about trees?
How to Prove NP-Completeness
Standard recipe to prove a problem X is NP-complete: (1) Show X is in NP (a poly-time verifier). (2) Pick a known NP-complete problem Y. (3) Build a poly-time reduction Y <=_pm X. (4) Prove correctness: y in Y iff f(y) in X.
**Parameterized complexity** is a modern refinement: a problem X is **FPT** (fixed-parameter tractable) if there is an algorithm running in O(f(k) * poly(n)) for parameter k. Vertex Cover is FPT in k (an O(2^k * n) algorithm). This matters for real problems: if k is small (a small vertex cover), the problem is solved quickly.
NP-complete problems always require exponential time, they cannot be solved quickly
NP-complete problems are often solved effectively in practice via SAT solvers (CDCL), approximations, parameterized algorithms, or special input structure.
SAT solvers (Z3, MiniSAT, CaDiCaL) solve industrial SAT instances with millions of variables in seconds. This is not a contradiction: NP-completeness is about worst-case behavior. Practical instances often have structure that CDCL exploits. TSP is solved exactly on thousands of cities via branch-and-bound.
You need to prove a new problem X is NP-complete. The first step is showing X is in NP. Why?
Summary
- **NP-completeness:** X in NP plus X NP-hard (every NP problem poly-reduces to X). SAT is the first NP-complete problem (Cook, 1971).
- **Proof recipe:** (1) X in NP (poly verifier), (2) Y <=_pm X for a known NP-complete Y, (3) prove the reduction is correct.
- **2-SAT vs 3-SAT:** the P/NP boundary by clause width. 2-SAT in P via SCC, 3-SAT NP-complete.
- **In practice:** NP-complete does not mean 'always slow'. SAT solvers, FPT algorithms, and special structure enable solving NP problems in real systems.
Related Topics
NP-completeness is the central concept of complexity theory:
- P vs NP — NP-complete problems are the hardest in NP. A poly algorithm for any of them implies P = NP.
- Reductions — Poly-time mapping reductions are the tool for proving NP-completeness.
- Approximation Algorithms — If a problem is NP-complete, approximation is the practical alternative.
Вопросы для размышления
- Vertex Cover on trees is solvable in O(n) DP, but NP-complete on general graphs. What about a tree's structure makes the problem easy?
- SAT solvers handle industrial SAT instances with millions of variables. Does that mean SAT is effectively in P for practical problems?
- If P = NP were proved with an O(n^10^100) algorithm, what would change in cryptography? In formal verification?