Causal Calculus

Counterfactual Reasoning

What would have happened if the patient had not taken the drug? This question cannot be answered statistically - only one outcome is observed. Pearl's counterfactual analysis gives a rigorous mathematical answer through structural causal models, enabling individual-level causal effects rather than just population averages.

Legal liability: 'was the harm caused by this specific action?'
Personalized medicine: expected treatment effect for a specific patient
Algorithmic fairness: would the outcome change if a protected attribute were different?
Insurance: was the loss caused by the insured event?
Explainable AI: counterfactual explanations ('which feature changes would flip the decision?')

Цели урока

Compute counterfactuals using the three-step procedure: abduction, action, prediction
Distinguish the three rungs of Pearl's ladder of causation: association, intervention, counterfactuals
Estimate ATE and ATT using potential outcomes and connect to the SCM formalism

Предварительные знания

Structural causal models and do-calculus
Conditional distributions and Bayesian updating
Rubin potential outcomes: $Y_i(1)$, $Y_i(0)$

The ladder of causation

Pearl identifies three levels of causal reasoning. First: association $P(Y|X)$, correlation. Second: intervention $P(Y|\mathrm{do}(X))$, what will happen under a change. Third: counterfactuals $P(Y_x|X=x', Y=y)$, what would have happened under $X=x$ given observation $X=x', Y=y$. Only the third level requires functional SCM equations.

Potential outcomes and ATE

Rubin's framework: $Y_i(d)$ is the potential outcome for subject $i$ under treatment $D=d$. Only $Y_i = D_i Y_i(1) + (1-D_i)Y_i(0)$ is observed. ATE $= E[Y(1)-Y(0)]$. Under randomization $D \perp (Y(0), Y(1))$: ATE is identified as $E[Y|D=1] - E[Y|D=0]$.

ATT (average treatment effect on the treated) $= E[Y(1)-Y(0)|D=1]$ differs from ATE when effect modification exists: people may respond to treatment differently. Confusing ATT and ATE is a common error when interpreting observational studies.

Defining Counterfactuals

Insurance pricing relies on counterfactuals: 'Would this driver have crashed without alcohol?' , UK courts have required causal models for such questions since 2019. The counterfactual Y_{X=x'}(u) asks: what would outcome Y have been for individual u had X been set to x', given that X=x was actually observed?

Counterfactuals require knowledge of the SCM (structural equations), not just the distribution. This is why they occupy the third, highest rung of the ladder of causation.

How does the counterfactual Y_{x'}(u) differ from the interventional P(Y|do(X=x'))?

P(Y|do(X=x')) is a population-level interventional distribution (rung 2). Y_{x'}(u) is an individual counterfactual for a specific u with fixed exogenous noise (rung 3), requiring abduction.

Pearl's Ladder of Causation

Pearl identifies three levels of causal knowledge: association (seeing), intervention (doing), and counterfactual (imagining). Each level requires strictly stronger assumptions and cannot be reached from lower rungs alone.

Machine learning operates primarily on rung 1. Most causal inference methods reach rung 2. Rung 3 is accessible only with full SCM knowledge or strong parametric assumptions.

Which rung of the ladder of causation requires full SCM knowledge (structural equations)?

Counterfactuals require abduction of exogenous variables from observed facts , impossible without structural equations. A DAG suffices for rung 2; observational data suffices for rung 1.

Average Treatment Effect (ATE) and Potential Outcomes

Rubin's potential outcomes framework formalizes ATE through the pair Y(1), Y(0) for each unit. The fundamental problem of causal inference: only one potential outcome is observed per individual , but an SCM lets us compute both analytically.

ATE = E[Y|do(X=1)] - E[Y|do(X=0)] connects the Rubin potential outcomes framework with Pearl's do-calculus. Under SUTVA and ignorability both frameworks yield the same identification result.

Why is the individual treatment effect ITE = Y(1) - Y(0) unobservable?

The fundamental problem of causal inference: a unit either receives treatment (Y(1) observed) or does not (Y(0) observed). The counterfactual outcome is inherently unobservable, requiring an SCM or randomization to estimate ATE.

Three-step counterfactual: patient took the drug and recovered

Step 1 (abduction): update $P(U|X=1, Y=1)$ - which background factors are consistent with the observation. Step 2 (action): set $X \leftarrow 0$ in structural equations ($\mathrm{do}(X=0)$). Step 3 (prediction): compute $Y$ with the updated $U$ and $X=0$. Result: $P(Y_{X=0}=1 | X=1, Y=1)$ - probability of recovery without the drug.

Итоги

Ladder of causation: association ($P(Y|X)$) < intervention ($P(Y|\mathrm{do}(X))$) < counterfactuals ($Y_x(u)$)
Counterfactual computed via abduction (update $U$) + action (do) + prediction (compute $Y$)
ATE $= E[Y(1)-Y(0)]$ identified under randomization; observational data requires additional assumptions

Connections to other topics

Counterfactual fairness (Kusner 2017) uses this framework to audit algorithms: a decision is fair if the counterfactual outcome matches when the protected attribute changes. Mediation analysis (NDE/NIE) is the natural next step, decomposing causal effects into direct and indirect components.

Related topics — extends

Вопросы для размышления

Why do counterfactuals require functional SCM equations rather than just the joint distribution?
The SUTVA assumption prohibits interference between subjects. How is this violated in social networks?
Can you compute a counterfactual for a nonlinear SCM with unobserved exogenous variables?