Data Science

Causal Inference

In 2021, Joshua Angrist, Guido Imbens and David Card received the Nobel Prize in Economics for the 'credibility revolution': proving that causal relationships can be measured in observational data without randomized experiments. Card showed that raising the minimum wage does not destroy jobs - contradicting 40 years of standard economic theory.

**Netflix A/B tests**: every algorithm change goes through a randomized experiment on 1% of users before global rollout, with over 250 concurrent tests at any time
**Google Causal Impact**: open-source tool for measuring advertising campaign effects via Bayesian structural time series without running an A/B test
**Mendelian randomization**: genetic variants as instruments allow studying disease causes without unethical experiments - standard in modern epidemiology

A/B Testing

Microsoft ran 20,000 A/B tests in 2012. Only 1/3 showed a positive result. This is exactly the value of experiments: most intuitive improvements don't work. **A/B testing** is the gold standard for measuring causal effects. Random assignment of users to groups (control/treatment) eliminates all confounders: smart and less smart, wealthy and poor, active and passive - all distributed equally. The metric difference between groups is a pure causal effect.

Key A/B test concepts: (1) Sample size: calculated via power analysis (power=0.8, alpha=0.05, MDE - minimum detectable effect); (2) p-value: probability of observing the same difference under the null hypothesis; (3) Multiple testing: testing 20 metrics means one will show p<0.05 by chance (Bonferroni, BH corrections); (4) Novelty effect: a new design attracts attention on its own - wait for stabilization; (5) Network effects: social networks violate user independence - cluster randomization is required.

Why does random assignment in A/B testing make groups comparable without explicitly controlling all confounders?

Difference-in-Differences

Randomly assigning some states to a minimum wage increase is impossible. But in 1994 New Jersey raised its minimum wage while Pennsylvania did not. **Difference-in-Differences (DiD)** exploits this 'natural' situation: it compares the change in NJ employment before/after with the change in PA over the same period. If trends would have been parallel without the intervention (parallel trends assumption), the difference-in-differences equals the causal effect.

DiD formula: ATT = (Y_treat_post - Y_treat_pre) - (Y_control_post - Y_control_pre). Key assumption - parallel trends: without the intervention both groups would have moved identically. How to check: compare historical trends before treatment (pre-treatment parallel trends plot). DiD in regression: Y = b0 + b1*Treat + b2*Post + b3*(Treat*Post) + e, where b3 = ATT. Staggered DiD: intervention occurs at different times for different units - requires care (Callaway-Sant'Anna estimator).

What happens to the DiD estimate when the parallel trends assumption is violated?

Instrumental Variables

Does education increase income? Smarter people both get more education and earn more - that is a confounder. A randomized education experiment is impossible. DiD without a 'natural' policy change is also unavailable. **Instrumental Variables (IV)** solve this: find a variable Z that (1) affects education (relevance), (2) affects income ONLY through education, not directly (exclusion restriction), (3) is not associated with unobserved confounders (exogeneity). Angrist's instrument: proximity to college as a random 'nudge' toward education.

2SLS method (Two-Stage Least Squares): Stage 1: regress Treatment on Instrument (get predicted 'clean' treatment); Stage 2: regress Outcome on predicted Stage 1 values. This gives LATE (Local Average Treatment Effect) - the effect for 'compliers' (those who change their decision because of the instrument). F-statistic from Stage 1 > 10: instrument is strong enough. The exclusion restriction requires an economic argument, not a statistical test - it is fundamentally untestable.

What is the Local Average Treatment Effect (LATE) estimated by instrumental variables?

Causal Graphs (DAG)

Judea Pearl received the Turing Award in 2011 for formalizing causality. His **Directed Acyclic Graph (DAG)** is a map of cause-and-effect relationships: nodes = variables, edges = causal arrows. A DAG formally defines what to control for (adjustment set), what not to control for (colliders), and whether causal identification from observational data is possible at all. Without a DAG, choosing regression covariates is guesswork.

Key DAG concepts: (1) Confounders: common causes of X and Y - must control; (2) Mediators: X -> M -> Y - do not control (blocks the causal path); (3) Colliders: X -> C <- Y - do not control (opens a spurious association); (4) Backdoor criterion: a set S blocks all backdoor paths from X to Y; (5) do-calculus: formal language for computing P(Y|do(X=x)) from observational data. Python libraries: dowhy, pgmpy.

Controlling for more variables in a regression always improves the causal estimate

Incorrectly chosen covariates (controlling mediators or colliders) can introduce bias worse than including no controls at all

A DAG formally shows: controlling a mediator blocks the causal path, controlling a collider opens a spurious association. Adding variables without understanding the DAG is the most common mistake in applied causal analysis.

Why can controlling a collider harm causal analysis?

Key Ideas

**A/B testing** is the gold standard: randomization automatically balances all confounders and the metric difference between groups equals a pure causal effect
**DiD** and **IV** are quasi-experimental methods for situations without randomization: DiD exploits natural experiments over time, IV uses an external instrument as a source of randomness
**DAG** formalizes the causal structure: it defines what to control, what not to control, and whether causal identification is possible from available data

Вопросы для размышления

A company wants to measure the effect of an email campaign on conversions. Random sending to a subset violates regulations. Which causal inference methods could be applied?
DiD by Card & Krueger showed minimum wage increases don't reduce employment. Name two possible violations of the parallel trends assumption in that study.
In a causal DAG: should a mediator (M: X->M->Y) be controlled when the total effect of X on Y is of interest? What if only the direct effect is needed?

Связанные уроки

stat-39-causal-confounders

A/B Testing

Why does random assignment in A/B testing make groups comparable without explicitly controlling all confounders?

Difference-in-Differences

What happens to the DiD estimate when the parallel trends assumption is violated?

Instrumental Variables

What is the Local Average Treatment Effect (LATE) estimated by instrumental variables?

Causal Graphs (DAG)

Controlling for more variables in a regression always improves the causal estimate

Incorrectly chosen covariates (controlling mediators or colliders) can introduce bias worse than including no controls at all

Why can controlling a collider harm causal analysis?

Key Ideas

**A/B testing** is the gold standard: randomization automatically balances all confounders and the metric difference between groups equals a pure causal effect

**DiD** and **IV** are quasi-experimental methods for situations without randomization: DiD exploits natural experiments over time, IV uses an external instrument as a source of randomness

**DAG** formalizes the causal structure: it defines what to control, what not to control, and whether causal identification is possible from available data

Causal Inference

A/B Testing

Difference-in-Differences

Instrumental Variables

Causal Graphs (DAG)

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Causal Inference

A/B Testing

Difference-in-Differences

Instrumental Variables

Causal Graphs (DAG)

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки