Statistics

Causal Inference

'Users who saw the ad buy 3× more often - advertising works!' But maybe they were simply more inclined to buy in the first place? Causal inference is the most important and underappreciated skill in data science. It separates 'what correlates' from 'what actually works'.

Medicine: the effect of a new drug (RCT - the standard of evidence-based medicine)
Economics: the effect of minimum wage on employment (Card & Krueger, Nobel 2021)
Product analytics: the impact of a feature on a metric (A/B test = a small RCT)
Policy: the effect of education on income (IV: distance to college)
Platforms: the effect of content on engagement (DiD with a staged rollout)

Предварительные знания

Linear Regression

Correlation vs Causation: Confounders and Simpson's Paradox

The WHI study (2002) found HRT increased heart disease risk by 29% , later shown to be confounded by age: women starting HRT were older on average. A **confounder** (confounding variable) is a variable associated with both the treatment and the outcome, distorting the estimated effect. Classic example: ice cream and drownings. Both correlate with summer (the confounder). **Simpson's paradox:** a trend observed in several subgroups reverses when the groups are combined - due to an unaccounted confounder.

**Three types of spurious associations:** 1. Confounding - a common cause (temperature → ice cream AND drownings) 2. Reverse causality - Y causes X, not the other way around 3. Chance - test enough variable pairs. The tool for distinguishing them: DAG (Directed Acyclic Graph) - a graph of causal relationships.

A study finds: cities with more hospitals have higher mortality rates. How do one explain this 'paradox'?

Potential Outcomes: The Rubin Causal Model

The **Rubin potential outcomes model** is the formal language of causal inference. Y(1) - outcome under treatment; Y(0) - outcome without treatment. **ATE** (Average Treatment Effect) = E[Y(1) − Y(0)]. **Fundamental problem of causal inference:** for one person we can never simultaneously observe Y(1) and Y(0) - one is always counterfactual. **Randomisation solves this:** under random assignment, Y(1) and Y(0) are independent of T → E[Y(1)−Y(0)] = E[Y|T=1] − E[Y|T=0].

**Key assumptions for causal inference:** 1. SUTVA - stable unit treatment value: no interference between subjects 2. Ignorability - conditional on observed covariates X, treatment T is independent of potential outcomes: (Y(0), Y(1)) ⊥ T | X 3. Overlap - every subject has a non-zero probability of receiving either treatment: 0 < P(T=1|X) < 1.

Fundamental problem of causal inference: for person A we observe Y(1)=80 (took the pill) and don't know Y(0). For person B - Y(0)=90 (didn't take it). How do one estimate ATE?

RCTs and Methods for Observational Data

**RCT (Randomised Controlled Trial)** - the gold standard: randomisation eliminates confounding. But costly, slow, sometimes unethical. For observational data: **Instrumental Variables (IV)** - a variable that affects treatment but not the outcome directly. **Diff-in-Diff (DiD)** - comparing changes (before/after) across groups, removing time-invariant confounders. **RDD** (Regression Discontinuity Design) - comparing observations near a threshold.

**Hierarchy of causal evidence:** 1. RCT (gold standard) 2. Quasi-experiments: DiD, RDD, IV 3. Propensity score methods (PSM, IPW, doubly robust) 4. Regression with confounder control 5. Correlation (least reliable). DAG (Directed Acyclic Graph) - the formal tool for identification and method selection.

A company wants to estimate the effect of employee training on productivity. Those who requested training are sent to it. Why is the naive comparison (trained vs untrained) biased?

Key Ideas

Correlation ≠ causation: confounders create spurious associations
Simpson's paradox: a trend reverses sign when subgroups are combined
Rubin model: Y(1), Y(0) - potential outcomes; ATE = E[Y(1)−Y(0)]
Fundamental problem: Y(1) and Y(0) cannot be observed simultaneously
RCT solves this via randomisation; for observational data - IV, DiD, PSM
Propensity score: models treatment probability to correct selection bias
DoWhy: Python library for formalising and estimating causal effects

Causal Inference and the Entire Statistics Course

Causal inference is the crown of statistical thinking. It connects regression (a description tool), hypothesis testing, experimental design (RCT), and the Bayesian approach (priors on causal structure). It is the transition from 'what we observe' to 'what will happen under intervention'.

Linear Regression — Regression describes associations; causality requires RCT or quasi-experiments
Bayesian Statistics — Bayesian causal models encode expert knowledge about the DAG

Вопросы для размышления

Take any correlational finding from the news. Name possible confounders. How would one design an RCT to test causality?
Why did the 2021 Nobel Prize in Economics (Card, Angrist, Imbens) go to work on 'natural experiments'? How is this better than ordinary observational studies?
In the product a new feature was launched in one region first. How would using DiD to estimate its effect without running a classic A/B test?

Связанные уроки

aie-42-ai-system-design

Correlation vs Causation: Confounders and Simpson's Paradox

A study finds: cities with more hospitals have higher mortality rates. How do one explain this 'paradox'?

Potential Outcomes: The Rubin Causal Model

Fundamental problem of causal inference: for person A we observe Y(1)=80 (took the pill) and don't know Y(0). For person B - Y(0)=90 (didn't take it). How do one estimate ATE?

RCTs and Methods for Observational Data

A company wants to estimate the effect of employee training on productivity. Those who requested training are sent to it. Why is the naive comparison (trained vs untrained) biased?

Key Ideas

Correlation ≠ causation: confounders create spurious associations

Simpson's paradox: a trend reverses sign when subgroups are combined

Rubin model: Y(1), Y(0) - potential outcomes; ATE = E[Y(1)−Y(0)]

Fundamental problem: Y(1) and Y(0) cannot be observed simultaneously

RCT solves this via randomisation; for observational data - IV, DiD, PSM

Propensity score: models treatment probability to correct selection bias

DoWhy: Python library for formalising and estimating causal effects