Data Science

Causal Inference

In 2021, Joshua Angrist, Guido Imbens and David Card received the Nobel Prize in Economics for the 'credibility revolution': proving that causal relationships can be measured in observational data without randomized experiments. Card showed that raising the minimum wage does not destroy jobs - contradicting 40 years of standard economic theory.

  • **Netflix A/B tests**: every algorithm change goes through a randomized experiment on 1% of users before global rollout, with over 250 concurrent tests at any time
  • **Google Causal Impact**: open-source tool for measuring advertising campaign effects via Bayesian structural time series without running an A/B test
  • **Mendelian randomization**: genetic variants as instruments allow studying disease causes without unethical experiments - standard in modern epidemiology

A/B Testing

Microsoft ran 20,000 A/B tests in 2012. Only 1/3 showed a positive result. This is exactly the value of experiments: most intuitive improvements don't work. **A/B testing** is the gold standard for measuring causal effects. Random assignment of users to groups (control/treatment) eliminates all confounders: smart and less smart, wealthy and poor, active and passive - all distributed equally. The metric difference between groups is a pure causal effect.

Key A/B test concepts: (1) Sample size: calculated via power analysis (power=0.8, alpha=0.05, MDE - minimum detectable effect); (2) p-value: probability of observing the same difference under the null hypothesis; (3) Multiple testing: testing 20 metrics means one will show p<0.05 by chance (Bonferroni, BH corrections); (4) Novelty effect: a new design attracts attention on its own - wait for stabilization; (5) Network effects: social networks violate user independence - cluster randomization is required.

Why does random assignment in A/B testing make groups comparable without explicitly controlling all confounders?

Difference-in-Differences

Randomly assigning some states to a minimum wage increase is impossible. But in 1994 New Jersey raised its minimum wage while Pennsylvania did not. **Difference-in-Differences (DiD)** exploits this 'natural' situation: it compares the change in NJ employment before/after with the change in PA over the same period. If trends would have been parallel without the intervention (parallel trends assumption), the difference-in-differences equals the causal effect.

DiD formula: ATT = (Y_treat_post - Y_treat_pre) - (Y_control_post - Y_control_pre). Key assumption - parallel trends: without the intervention both groups would have moved identically. How to check: compare historical trends before treatment (pre-treatment parallel trends plot). DiD in regression: Y = b0 + b1*Treat + b2*Post + b3*(Treat*Post) + e, where b3 = ATT. Staggered DiD: intervention occurs at different times for different units - requires care (Callaway-Sant'Anna estimator).

What happens to the DiD estimate when the parallel trends assumption is violated?

Instrumental Variables

Does education increase income? Smarter people both get more education and earn more - that is a confounder. A randomized education experiment is impossible. DiD without a 'natural' policy change is also unavailable. **Instrumental Variables (IV)** solve this: find a variable Z that (1) affects education (relevance), (2) affects income ONLY through education, not directly (exclusion restriction), (3) is not associated with unobserved confounders (exogeneity). Angrist's instrument: proximity to college as a random 'nudge' toward education.

2SLS method (Two-Stage Least Squares): Stage 1: regress Treatment on Instrument (get predicted 'clean' treatment); Stage 2: regress Outcome on predicted Stage 1 values. This gives LATE (Local Average Treatment Effect) - the effect for 'compliers' (those who change their decision because of the instrument). F-statistic from Stage 1 > 10: instrument is strong enough. The exclusion restriction requires an economic argument, not a statistical test - it is fundamentally untestable.

What is the Local Average Treatment Effect (LATE) estimated by instrumental variables?

Causal Graphs (DAG)

Judea Pearl received the Turing Award in 2011 for formalizing causality. His **Directed Acyclic Graph (DAG)** is a map of cause-and-effect relationships: nodes = variables, edges = causal arrows. A DAG formally defines what to control for (adjustment set), what not to control for (colliders), and whether causal identification from observational data is possible at all. Without a DAG, choosing regression covariates is guesswork.

Key DAG concepts: (1) Confounders: common causes of X and Y - must control; (2) Mediators: X -> M -> Y - do not control (blocks the causal path); (3) Colliders: X -> C <- Y - do not control (opens a spurious association); (4) Backdoor criterion: a set S blocks all backdoor paths from X to Y; (5) do-calculus: formal language for computing P(Y|do(X=x)) from observational data. Python libraries: dowhy, pgmpy.

Controlling for more variables in a regression always improves the causal estimate

Incorrectly chosen covariates (controlling mediators or colliders) can introduce bias worse than including no controls at all

A DAG formally shows: controlling a mediator blocks the causal path, controlling a collider opens a spurious association. Adding variables without understanding the DAG is the most common mistake in applied causal analysis.

Why can controlling a collider harm causal analysis?

Key Ideas

  • **A/B testing** is the gold standard: randomization automatically balances all confounders and the metric difference between groups equals a pure causal effect
  • **DiD** and **IV** are quasi-experimental methods for situations without randomization: DiD exploits natural experiments over time, IV uses an external instrument as a source of randomness
  • **DAG** formalizes the causal structure: it defines what to control, what not to control, and whether causal identification is possible from available data

Related Topics

Causal inference intersects with time series analysis and classical ML methods:

  • Time Series Analysis — Causal Impact uses Bayesian time series for causal effect estimation; DiD works with panel data (time series for multiple units)
  • Ensemble Methods — Causal forests (Wager & Athey, 2018) extend random forests to estimate heterogeneous treatment effects (CATE) across different user subgroups

Вопросы для размышления

  • A company wants to measure the effect of an email campaign on conversions. Random sending to a subset violates regulations. Which causal inference methods could be applied?
  • DiD by Card & Krueger showed minimum wage increases don't reduce employment. Name two possible violations of the parallel trends assumption in that study.
  • In a causal DAG: should a mediator (M: X->M->Y) be controlled when the total effect of X on Y is of interest? What if only the direct effect is needed?

Связанные уроки

  • stat-39-causal-confounders
Causal Inference

0

1

Sign In