Causal Calculus

Backdoor Criterion: Blocking the Confounding Trap

2020. Observational data from multiple countries: patients on ibuprofen die less from COVID. Headlines: 'ibuprofen treats COVID'. Months later, analysis controlling for disease severity found: no effect at all. Severely ill patients were shifted to heavier drugs immediately - they died more. Disease severity drove both the ibuprofen decision and the outcome. That was a backdoor path. The backdoor criterion is the algorithm that identifies such traps from the graph's arrows alone, without a single number.

**Debiasing in ML**: algorithms trained on observational data learn spurious correlations (backdoor paths). DoWhy and CausalML implement backdoor adjustment as a pre-processing step before training - so the model learns causal effects rather than confounders
**A/B testing**: randomization is the physical destruction of all backdoor paths. The backdoor criterion explains why randomization works: the coin flip severs the edge X <- confounder, making backdoor paths inactive
**ML fairness audits**: Microsoft Research applies the backdoor criterion to detect protected-attribute bias. When a protected attribute (gender, race) sits on a backdoor path, controlling for it is not censorship - it is a mathematical requirement for honest effect estimation
**Epidemiology**: G-computation (Robins, 1986) is the backdoor adjustment formula under a different name. All modern ATE estimates in clinical research are built on it

Предварительные знания

d-separation: chains, forks, colliders

Backdoor Paths: Where Confounding Comes From

2020. Observational data from multiple countries: patients on ibuprofen die less often from COVID. Headlines read: ibuprofen treats COVID. The mechanism nobody mentioned: severely ill patients were less likely to receive ibuprofen - they were moved to heavier drugs immediately. Disease severity drove both treatment choice and death. That is a fork - a classic backdoor path.

A backdoor path from X to Y is any path that begins with an arrow pointing into X. That is: a path of the form X <- ... -> Y via a common ancestor. Such a path transmits correlation between X and Y without being causal. Not noise, not measurement error - a structural feature of the DAG that is visible without a single number.

**Definition of a backdoor path**: a path $\pi$ between $X$ and $Y$ is a backdoor path if the first edge on the path points into $X$ (that is, $\pi$ begins with $\cdot \to X$). The observed correlation $P(Y \mid X)$ mixes the causal effect with information flowing along backdoor paths.

DoWhy (Microsoft) automatically enumerates all backdoor paths when calling `model.identify_effect()`. Before issuing any effect estimate, it builds the list of active backdoor paths and checks whether the proposed adjustment set blocks all of them. If not - the effect is flagged as unidentified.

In the DAG: $\text{Education} \leftarrow \text{IQ} \to \text{Salary}$, $\text{Education} \to \text{Salary}$. Is the path Education - IQ - Salary a backdoor path when estimating the effect of Education on Salary?

Backdoor Criterion: What Is Sufficient to Control

Pearl, 1993. One theorem - and the question of which variables to control became algorithmic rather than intuitive. Before this, econometricians argued for years about what belongs in a regression. After - it is checkable in minutes.

**Backdoor criterion (Pearl, 1993)**: a set $Z$ satisfies the backdoor criterion relative to $(X, Y)$ in DAG $G$ if: 1. no node in $Z$ is a descendant of $X$ 2. $Z$ blocks all backdoor paths between $X$ and $Y$ (i.e., d-separates $X$ from $Y$ in the subgraph $G_{\underline{X}}$ with all outgoing edges of $X$ removed).

Condition (1) - no descendants of X - is critical. Including a variable that X influences creates a collider or partially blocks a mediator. This opens new paths instead of closing old ones. This is exactly where the naive heuristic 'control everything correlated with X and Y' breaks down.

Graph	Backdoor path	Valid adjustment set Z	Cannot include in Z
$X \leftarrow C \to Y$	$X \leftarrow C \to Y$	$\{C\}$	descendants of $X$ or $Y$
$X \leftarrow C \to M \to Y$	$X \leftarrow C \to M \to Y$	$\{C\}$ or $\{C, M\}$	$M$ if the goal is total effect
$X \leftarrow U_1 \to B \leftarrow U_2 \to Y$	through $U_1, U_2$ (hidden)	no backdoor adj possible	B - collider, forbidden

CausalML (Uber) and DoWhy (Microsoft) implement minimum adjustment set search via the O'Shaughnessy (2020) algorithm: find the smallest Z that blocks all backdoor paths. A smaller set means fewer variables to measure in an RCT and lower variance in observational estimates.

DAG: $X \leftarrow C \to Y$, $X \to M \to Y$, $X \to Y$. The goal is to estimate the **total effect** of $X$ on $Y$. Which adjustment set is valid?

Adjustment Formula: P(Y|do(X)) from Observational Data

Once Z satisfies the backdoor criterion, the problem is solved. The do-probability is expressible in terms of the observed distribution. No experiment required - just the right set of measured variables and one formula.

**Backdoor adjustment formula**: if $Z$ satisfies the backdoor criterion for $(X, Y)$, then: $$P(Y = y \mid \mathrm{do}(X = x)) = \sum_{z} P(Y = y \mid X = x, Z = z) \cdot P(Z = z)$$ For continuous $Z$: integral over $z$ instead of sum. This is a population-weighted conditional average.

This is exactly what regression adjustment does: estimate $E[Y \mid X=x, Z=z]$ from data, then average over the marginal $P(Z)$. The method is called G-computation in epidemiology and average treatment effect (ATE) estimation in econometrics. The same result - different names across communities.

Why Randomization Works: Backdoor Through the do-Operator

An RCT is the physical implementation of the do-operator

Observational study of ibuprofen: P(death=1 | ibuprofen=1) != P(death=1 | do(ibuprofen=1)) Because P(ibuprofen=1) depends on disease severity. RCT: a coin flip assigns ibuprofen. Now: P(ibuprofen=1 | severity) = 0.5 for every severity level. The edge severity -> ibuprofen disappears from the graph. Backdoor path X <- severity -> death is structurally blocked. Backdoor adjustment formula achieves the same statistically: Estimates the effect separately at each severity level, then weights by the real severity distribution in the population. ATE = sum_z P(death | ibuprofen=1, severity=z) * P(severity=z) - sum_z P(death | ibuprofen=0, severity=z) * P(severity=z) This explains why randomization is the gold standard: it automatically blocks all backdoor paths, including those the researcher is unaware of.

The backdoor criterion says: control for everything that influences X or Y

Control only for variables that block backdoor paths and are not descendants of X

Descendants of X are mediators or colliders. Including a mediator blocks part of the causal effect, deflating the total effect estimate. Including a collider opens new unwanted paths. The backdoor criterion is not 'add variables' - it is 'select the right variables from the graph structure'.

A researcher estimates the effect of a new drug (X) on survival (Y). Age (A) is known to affect both drug assignment and survival. How should backdoor adjustment be applied correctly?

Key Ideas

**Backdoor path**: any path from X to Y beginning with an incoming arrow into X. This is confounding - correlation without causation that naive observation mistakes for an effect
**Backdoor criterion**: a set Z is valid if (1) it contains no descendants of X, and (2) it d-separates X from Y in the graph with all outgoing edges of X removed. Checked from the DAG structure, not from data
**Adjustment formula**: $P(Y \mid \mathrm{do}(X)) = \sum_z P(Y \mid X, Z=z) \cdot P(Z=z)$ - population-weighted conditional average. This is G-computation, regression adjustment, and IPW under different names
**RCT as do-operator**: randomization physically removes backdoor paths. The backdoor criterion explains why - and shows that observational analysis with the right Z is mathematically equivalent to an RCT
**Descendants of X are forbidden**: mediators deflate the total effect; colliders open new paths. The backdoor criterion guards against both mistakes

Where to Go Next

The backdoor criterion works when all confounders are measurable. Beyond that - methods for when they are not.

Frontdoor Criterion — Effect estimation via mediator when backdoor confounders are hidden
do-Operator — Formalization of intervention - what the adjustment formula actually computes
Identifiability — When do-probabilities are expressible from observational data - the general theory
Mediation Analysis — Direct and indirect effects - extending backdoor to mediators

Вопросы для размышления

Ibuprofen and COVID: after controlling for disease severity, the effect vanished. The backdoor criterion does not forbid observation - it specifies what to control for. What variables in current projects might be hidden confounders on such backdoor paths?
If a team runs an A/B test but users self-select into variants (self-selection bias) - what backdoor path does this create, and what adjustment set is needed?
The backdoor criterion requires measurable Z. When the confounder is unobservable (e.g., 'motivation' or 'management quality') - what remains? Instrumental variables, frontdoor, difference-in-differences - all address exactly this problem.

Связанные уроки

cc-02-d-separation — d-separation is the foundation; backdoor criterion is its direct corollary
cc-04-frontdoor — Frontdoor adjustment handles cases where backdoor sets don't exist
cc-05-do-operator — The do-operator formalizes what the adjustment formula actually computes
prob-03-conditional — Conditional probability is the math behind the adjustment formula
stat-09-regression — Regression adjustment is the numerical implementation of backdoor formula
cc-11-causal-discovery — PC algorithm finds the graph on which backdoor sets are identified
stat-01-sampling