Causal Calculus
Causal Discovery Algorithms
Can the causal graph structure be recovered from observational data alone - without knowing whether X causes Y or Y causes X?
- **Bioinformatics:** recovering gene regulatory networks from expression data - DREAM Challenge uses PC and FCI algorithms as baseline methods
- **Finance:** discovering causal links between financial instruments for risk modeling in investment banks
- **Climatology:** recovering causal links between climate variables (CO2, temperature, circulation) without controlled experiments
- **MLOps (Microsoft Azure):** FCI algorithm to discover causal dependencies between microservices from transaction logs at scale
Предварительные знания
- d-separation and Bayesian networks
- Statistical independence tests
- Graph theory: CPDAG, MPDAG
Causal discovery is the task of recovering causal DAG structure from observational data alone. From data alone it is impossible to distinguish X->Y from Y->X (they produce identical marginal distributions), so the PC algorithm recovers the Markov equivalence class - a CPDAG where some edges are oriented uniquely, while others remain ambiguous.
Faithfulness assumption: the PC algorithm assumes all conditional independencies in the data are explained by the DAG structure, not by accidental cancellations of path effects. Faithfulness violations lead to incorrect edge removal. In practice this is rare, but requires caution with small samples or near-linear relationships.
PC Algorithm: structure discovery via independence tests
PC (Peter-Clark, Spirtes-Glymour 1991) is the classical constraint-based algorithm for causal structure discovery. It runs in two phases - skeleton construction through conditional independence tests, followed by edge orientation from detected v-structures. Under causal sufficiency (no hidden confounders), PC recovers the equivalence class of the true DAG.
PC is sensitive to errors in independence tests: one false independence at an early step can cascade through the structure. The stable variant (PC-stable, Colombo-Maathuis 2014) reorders the tests so that the result no longer depends on the random order of variables.
PC assumes causal sufficiency - all common causes are observed. With hidden confounders the algorithm produces incorrect orientations. FCI generalizes PC for that setting.
Which assumption must hold for PC to be correct?
FCI Algorithm: latent variables and MAGs
FCI (Fast Causal Inference, Spirtes-Meek-Richardson 1995) generalizes PC to allow hidden variables and selection bias. Instead of a DAG it recovers a PAG (Partial Ancestral Graph) - a graph with four edge-end types (arrowhead, circle, tail) that encodes uncertainty about causal structure.
A bidirected edge X <-> Y in a PAG says: a hidden common ancestor of X and Y exists, but no direct causal link. This is critical in epidemiology - a spurious link between two symptoms can come from a hidden third variable (genetic factor, exposure).
Modern extensions: GFCI (Greedy FCI) combines FCI with local score-based search, speeding up edge orientation on big data. ICA-LiNGAM removes orientation ambiguity under non-Gaussian noise - extra information on noise shape allows unique DAG recovery.
What does a bidirected edge X <-> Y mean in a PAG?
Score-Based Methods: GES, GFCI, scoring functions
Score-based methods discover structure by maximizing a global scoring function that measures how well a graph fits the data. Unlike constraint-based methods, they do not rely on a sequence of independence tests, are more robust to test errors, and scale to thousands of nodes. GES (Greedy Equivalence Search, Chickering 2002) is the canonical example.
Decomposability of the scoring function is the key requirement for efficient search. The score must split into a sum over each node and its parents. This lets the engine compute score differences incrementally on local changes, avoiding full graph rescoring.
Why is decomposability of the scoring function critical for GES?
Family of causal discovery algorithms
Different algorithms address different sub-problems depending on assumptions about the data and graph structure.
- PC and FCI (constraint-based) — Related topic
- GES and FGES (score-based) — Related topic
- LiNGAM (functional causal models) — Related topic
- Neural causal discovery — Related topic
Итоги
- PC algorithm: skeleton via conditional independence tests, then v-structure orientation via separating sets
- Result of PC is a CPDAG (Markov equivalence class), not a unique DAG
- FCI extends PC to latent confounders - result is a PAG with bi-directed edges X<->Y
- LiNGAM uniquely identifies the DAG under linear non-Gaussian noise via ICA
- Faithfulness: PC assumes independencies in data reflect DAG structure, not accidental cancellations
- Score-based methods (GES): maximize BIC in CPDAG space; more robust to faithfulness violations
Why does the PC algorithm recover only a Markov equivalence class rather than a unique DAG?
X->Y and Y->X (with no other variables) have identical marginal distributions. Only v-structures X->Z<-Y are uniquely identifiable because they create a unique dependence pattern (the collider). For unique identification beyond the equivalence class, additional assumptions are needed (non-Gaussianity in LiNGAM) or experimental interventions.