Causal Calculus
Backdoor Criterion: Blocking the Confounding Trap
2020. Observational data from multiple countries: patients on ibuprofen die less from COVID. Headlines: 'ibuprofen treats COVID'. Months later, analysis controlling for disease severity found: no effect at all. Severely ill patients were shifted to heavier drugs immediately - they died more. Disease severity drove both the ibuprofen decision and the outcome. That was a backdoor path. The backdoor criterion is the algorithm that identifies such traps from the graph's arrows alone, without a single number.
- **Debiasing in ML**: algorithms trained on observational data learn spurious correlations (backdoor paths). DoWhy and CausalML implement backdoor adjustment as a pre-processing step before training - so the model learns causal effects rather than confounders
- **A/B testing**: randomization is the physical destruction of all backdoor paths. The backdoor criterion explains why randomization works: the coin flip severs the edge X <- confounder, making backdoor paths inactive
- **ML fairness audits**: Microsoft Research applies the backdoor criterion to detect protected-attribute bias. When a protected attribute (gender, race) sits on a backdoor path, controlling for it is not censorship - it is a mathematical requirement for honest effect estimation
- **Epidemiology**: G-computation (Robins, 1986) is the backdoor adjustment formula under a different name. All modern ATE estimates in clinical research are built on it
Предварительные знания
Backdoor Paths: Where Confounding Comes From
2020. Observational data from multiple countries: patients on ibuprofen die less often from COVID. Headlines read: ibuprofen treats COVID. The mechanism nobody mentioned: severely ill patients were less likely to receive ibuprofen - they were moved to heavier drugs immediately. Disease severity drove both treatment choice and death. That is a fork - a classic backdoor path.
A backdoor path from X to Y is any path that begins with an arrow pointing into X. That is: a path of the form X <- ... -> Y via a common ancestor. Such a path transmits correlation between X and Y without being causal. Not noise, not measurement error - a structural feature of the DAG that is visible without a single number.
**Definition of a backdoor path**: a path $\pi$ between $X$ and $Y$ is a backdoor path if the first edge on the path points into $X$ (that is, $\pi$ begins with $\cdot \to X$). The observed correlation $P(Y \mid X)$ mixes the causal effect with information flowing along backdoor paths.
DoWhy (Microsoft) automatically enumerates all backdoor paths when calling `model.identify_effect()`. Before issuing any effect estimate, it builds the list of active backdoor paths and checks whether the proposed adjustment set blocks all of them. If not - the effect is flagged as unidentified.
In the DAG: $\text{Education} \leftarrow \text{IQ} \to \text{Salary}$, $\text{Education} \to \text{Salary}$. Is the path Education - IQ - Salary a backdoor path when estimating the effect of Education on Salary?
Backdoor Criterion: What Is Sufficient to Control
Pearl, 1993. One theorem - and the question of which variables to control became algorithmic rather than intuitive. Before this, econometricians argued for years about what belongs in a regression. After - it is checkable in minutes.
**Backdoor criterion (Pearl, 1993)**: a set $Z$ satisfies the backdoor criterion relative to $(X, Y)$ in DAG $G$ if: 1. no node in $Z$ is a descendant of $X$ 2. $Z$ blocks all backdoor paths between $X$ and $Y$ (i.e., d-separates $X$ from $Y$ in the subgraph $G_{\underline{X}}$ with all outgoing edges of $X$ removed).
Condition (1) - no descendants of X - is critical. Including a variable that X influences creates a collider or partially blocks a mediator. This opens new paths instead of closing old ones. This is exactly where the naive heuristic 'control everything correlated with X and Y' breaks down.
| Graph | Backdoor path | Valid adjustment set Z | Cannot include in Z |
|---|---|---|---|
| $X \leftarrow C \to Y$ | $X \leftarrow C \to Y$ | $\{C\}$ | descendants of $X$ or $Y$ |
| $X \leftarrow C \to M \to Y$ | $X \leftarrow C \to M \to Y$ | $\{C\}$ or $\{C, M\}$ | $M$ if the goal is total effect |
| $X \leftarrow U_1 \to B \leftarrow U_2 \to Y$ | through $U_1, U_2$ (hidden) | no backdoor adj possible | B - collider, forbidden |
CausalML (Uber) and DoWhy (Microsoft) implement minimum adjustment set search via the O'Shaughnessy (2020) algorithm: find the smallest Z that blocks all backdoor paths. A smaller set means fewer variables to measure in an RCT and lower variance in observational estimates.
DAG: $X \leftarrow C \to Y$, $X \to M \to Y$, $X \to Y$. The goal is to estimate the **total effect** of $X$ on $Y$. Which adjustment set is valid?
Adjustment Formula: P(Y|do(X)) from Observational Data
Once Z satisfies the backdoor criterion, the problem is solved. The do-probability is expressible in terms of the observed distribution. No experiment required - just the right set of measured variables and one formula.
**Backdoor adjustment formula**: if $Z$ satisfies the backdoor criterion for $(X, Y)$, then: $$P(Y = y \mid \mathrm{do}(X = x)) = \sum_{z} P(Y = y \mid X = x, Z = z) \cdot P(Z = z)$$ For continuous $Z$: integral over $z$ instead of sum. This is a population-weighted conditional average.
This is exactly what regression adjustment does: estimate $E[Y \mid X=x, Z=z]$ from data, then average over the marginal $P(Z)$. The method is called G-computation in epidemiology and average treatment effect (ATE) estimation in econometrics. The same result - different names across communities.
Why Randomization Works: Backdoor Through the do-Operator
An RCT is the physical implementation of the do-operator
Observational study of ibuprofen: P(death=1 | ibuprofen=1) != P(death=1 | do(ibuprofen=1)) Because P(ibuprofen=1) depends on disease severity. RCT: a coin flip assigns ibuprofen. Now: P(ibuprofen=1 | severity) = 0.5 for every severity level. The edge severity -> ibuprofen disappears from the graph. Backdoor path X <- severity -> death is structurally blocked. Backdoor adjustment formula achieves the same statistically: Estimates the effect separately at each severity level, then weights by the real severity distribution in the population. ATE = sum_z P(death | ibuprofen=1, severity=z) * P(severity=z) - sum_z P(death | ibuprofen=0, severity=z) * P(severity=z) This explains why randomization is the gold standard: it automatically blocks all backdoor paths, including those the researcher is unaware of.
The backdoor criterion says: control for everything that influences X or Y
Control only for variables that block backdoor paths and are not descendants of X
Descendants of X are mediators or colliders. Including a mediator blocks part of the causal effect, deflating the total effect estimate. Including a collider opens new unwanted paths. The backdoor criterion is not 'add variables' - it is 'select the right variables from the graph structure'.
A researcher estimates the effect of a new drug (X) on survival (Y). Age (A) is known to affect both drug assignment and survival. How should backdoor adjustment be applied correctly?
Key Ideas
- **Backdoor path**: any path from X to Y beginning with an incoming arrow into X. This is confounding - correlation without causation that naive observation mistakes for an effect
- **Backdoor criterion**: a set Z is valid if (1) it contains no descendants of X, and (2) it d-separates X from Y in the graph with all outgoing edges of X removed. Checked from the DAG structure, not from data
- **Adjustment formula**: $P(Y \mid \mathrm{do}(X)) = \sum_z P(Y \mid X, Z=z) \cdot P(Z=z)$ - population-weighted conditional average. This is G-computation, regression adjustment, and IPW under different names
- **RCT as do-operator**: randomization physically removes backdoor paths. The backdoor criterion explains why - and shows that observational analysis with the right Z is mathematically equivalent to an RCT
- **Descendants of X are forbidden**: mediators deflate the total effect; colliders open new paths. The backdoor criterion guards against both mistakes
Where to Go Next
The backdoor criterion works when all confounders are measurable. Beyond that - methods for when they are not.
- Frontdoor Criterion — Effect estimation via mediator when backdoor confounders are hidden
- do-Operator — Formalization of intervention - what the adjustment formula actually computes
- Identifiability — When do-probabilities are expressible from observational data - the general theory
- Mediation Analysis — Direct and indirect effects - extending backdoor to mediators
Вопросы для размышления
- Ibuprofen and COVID: after controlling for disease severity, the effect vanished. The backdoor criterion does not forbid observation - it specifies what to control for. What variables in current projects might be hidden confounders on such backdoor paths?
- If a team runs an A/B test but users self-select into variants (self-selection bias) - what backdoor path does this create, and what adjustment set is needed?
- The backdoor criterion requires measurable Z. When the confounder is unobservable (e.g., 'motivation' or 'management quality') - what remains? Instrumental variables, frontdoor, difference-in-differences - all address exactly this problem.
Связанные уроки
- cc-02-d-separation — d-separation is the foundation; backdoor criterion is its direct corollary
- cc-04-frontdoor — Frontdoor adjustment handles cases where backdoor sets don't exist
- cc-05-do-operator — The do-operator formalizes what the adjustment formula actually computes
- prob-03-conditional — Conditional probability is the math behind the adjustment formula
- stat-09-regression — Regression adjustment is the numerical implementation of backdoor formula
- cc-11-causal-discovery — PC algorithm finds the graph on which backdoor sets are identified
- stat-01-sampling