Causal Calculus

Mediation Analysis: NDE and NIE

Aspirin reduces the risk of heart attack. Everyone knows that. But HOW - directly by thinning the blood, or by suppressing inflammation? The answer changes everything: if through inflammation, ibuprofen is better and cheaper. If directly through platelets, ibuprofen is useless. Two mechanisms, identical total effect, opposite clinical decisions. Mediation analysis is the only way to answer this without a second randomized trial. Pearl 2001 provided the mathematics. Vanderbilt, NIH, and Google Health already use it in production.

**Clinical trials (Pearl 2001 mediation formula):** does the drug work through biomarker M or directly? If NDE is close to zero and NIE is close to ATE - a cheap generic that raises the biomarker is sufficient. The FDA Biomarker Qualification Program uses mediation arguments when validating surrogate endpoints
**Algorithmic fairness (Nabi-Shpitser 2018):** path-specific causal effects formalize discrimination. Direct path race->salary is unlawful discrimination. Indirect path race->profession->salary is systemic inequality. Fairness constraints in ML models are built as suppression of the direct path while leaving legitimate paths intact
**RLHF mediation:** X=RLHF fine-tuning, M=chain-of-thought (extended reasoning), Y=answer quality. NDE - improvement through style/safety directly. NIE - improvement through better reasoning. If NIE >> NDE - RLHF teaches reasoning, not just alignment surface behavior
**Uplift modeling (ad platforms):** does the ad affect purchase directly (immediate response) or through brand awareness (bought a week later)? Different media channels are optimal for different paths. Without mediation, budget optimization targets only total effect and allocates spend inefficiently

Предварительные знания

do-operator: $\text{do}(X=x)$ as an intervention distinct from observation $X=x$
Counterfactual notation: $Y(x)$ - potential outcome under $\text{do}(X=x)$
Backdoor adjustment: identification of $P(Y \mid \text{do}(X))$ through observed confounders

The Mediation Idea: Direct vs Indirect Path

Aspirin reduces the risk of heart attack. Everyone knows that. But HOW - directly by thinning the blood and blocking platelets, or by suppressing inflammation? The answer changes everything. If through inflammation, ibuprofen gives the same effect at lower cost. If directly through platelets, ibuprofen is useless. Two different mechanisms. Identical total effect. Opposite clinical decisions.

Mediation analysis is the mathematical framework for this distinction. The structure is simple: there is a treatment $X$, an outcome $Y$, and a variable $M$ - the mediator through which $X$ partially influences $Y$.

The intuition: the direct effect is what happens when $X$ changes but the mediator is frozen at its natural level. The indirect effect is what happens when $X$ is fixed but the mediator is allowed to respond to the change in $X$.

**Algorithmic fairness:** Nabi-Shpitser 2018 used path-specific causal effects to formalize discrimination. The direct path $X_{race} \to Y_{salary}$ is direct discrimination. The indirect path $X_{race} \to M_{profession} \to Y_{salary}$ is discrimination through occupational sorting. The separation allows building fairness constraints that suppress the illegal path without touching legitimate mediating paths.

A study shows: a drug reduces mortality. Total effect = -0.15. A decision must be made - whether to approve a cheap generic that blocks mediator M but not the direct X->Y path. What information is needed?

NDE and NIE: Formal Definitions

Pearl 2001 introduced rigorous counterfactual definitions of direct and indirect effects. The key object is $Y(x, m)$: the potential outcome under $\text{do}(X=x)$ and $\text{do}(M=m)$ simultaneously. The second object is $M(x')$: the potential value of the mediator under $\text{do}(X=x')$.

**Natural Direct Effect (NDE)** - the change in $Y$ when $X$ moves from $x'$ to $x$, holding the mediator fixed at the level $M(x')$ - the level it would have taken under $X=x'$:

**Natural Indirect Effect (NIE)** - the change in $Y$ only through the mediator, when $X$ is fixed at $x$ but $M$ shifts from $M(x')$ to $M(x)$:

**Total Effect (ATE)** decomposes exactly:

**RLHF application:** $X$ = RLHF fine-tuning (on/off), $M$ = chain-of-thought - whether the model shows extended reasoning, $Y$ = downstream task quality. NDE - direct improvement in quality (style, safety). NIE - improvement through better reasoning chains. If NIE >> NDE, RLHF is teaching the model to reason, not just patching alignment surface behavior.

One critical detail: $Y(x, M(x'))$ is not just conditioning. It is a counterfactual object requiring two simultaneous interventions. This is why mediation analysis is strictly harder than total effect identification - NDE and NIE require stronger assumptions than ATE.

An RLHF study finds: $\text{NDE} = 0.05$, $\text{NIE} = 0.35$, $\text{ATE} = 0.40$. What follows from these numbers?

Pearl 2001 Mediation Formula: Identification and Assumptions

The NDE and NIE definitions are elegant, but how are they computed from data? $Y(x, M(x'))$ is a counterfactual that simultaneously requires $\text{do}(X=x)$ and $\text{do}(M=M(x'))$. This is never directly observed. Pearl 2001 derived the mediation formula - conditions under which NDE and NIE are expressible through the observational distribution.

**Mediation formula for NDE (Pearl 2001):**

**Three identifiability conditions (no unmeasured confounders):** 1. No hidden confounders for $X \to Y$: $W$ blocks all backdoor paths between $X$ and $Y$. 2. No hidden confounders for $X \to M$: $W$ blocks all backdoor paths between $X$ and $M$. 3. No hidden confounders for $M \to Y$: $W$ blocks all backdoor paths between $M$ and $Y$ - including those that are descendants of $X$. Condition 3 is the most demanding. It rules out any unobserved confounder $M \leftrightarrow Y$ that itself depends on $X$.

When all three conditions hold, the formula replaces $\text{do}$ with ordinary conditional probabilities. For continuous variables and a linear model, this reduces to two-stage regression. But the structural logic is prior to any parametric choice.

**Robins-Richardson (2010) sharp null:** if total effect = 0, mediation analysis does not guarantee NDE = 0 and NIE = 0 separately. NDE and NIE can be non-zero and opposite in sign - they cancel. This means: even at zero ATE, mediation analysis is informative - it catches hidden paths that counteract each other.

**Clinical trials example:** does the drug act through a biomarker (M = protein level) or directly (X->Y)? If NIE is close to ATE - a cheap generic that raises the biomarker suffices. If NDE >> NIE - the original mechanism is needed. The mediation formula answers this from observational data without a separate RCT for each pathway.

A researcher finds ATE = 0.3, NDE = 0.4, NIE = -0.1. Is this possible?

VanderWeele Decomposition and ML Applications

Pearl 2001 gave the binary decomposition: NDE + NIE = ATE. VanderWeele 2014 went further. When there is interaction between $X$ and $M$, the simple decomposition masks structure. The total effect decomposes into four components.

**VanderWeele 4-way decomposition (2014):**

The 4-way decomposition matters in medicine: interaction between treatment and biomarker is a separate clinical question. If the drug only works when M is high - this is effect modification, not mediation. The distinction determines which patients to treat.

**Algorithmic fairness (Nabi-Shpitser 2018):** path-specific causal effects for fairness. Let $X$ = race, $M$ = profession, $Y$ = salary. The direct path $X \to Y$ - unlawful discrimination. The indirect path $X \to M \to Y$ - historically mediated inequality, but contested. The regulatory requirement: suppress the direct path without touching legitimate paths. Without mediation analysis, formalizing this as a fairness constraint is not possible.

**Uplift modeling:** the advertiser wants to know - does an ad affect purchase directly (saw and bought immediately) or through awareness (remembered the brand, bought a week later)? The direct effect is immediate response. The indirect is brand building. Different media channels are optimal for different paths. Without mediation, optimization targets only total effect and misallocates budget.

Mediation analysis is just regression with the mediator added as a control variable

Regression with M on the right-hand side gives a biased estimate of the direct effect when there is a M-Y confounder. Pearl's mediation formula requires separate identification of two causal paths.

Including M in regression blocks the path X->M->Y, but does not control for M<->Y confounders. If an unobserved U influences both M and Y, the coefficient on X in regression Y~X+M+W is biased. Pearl's formula requires a separate backdoor adjustment for the M->Y path. The Baron-Kenny procedure from 1986 did not do this - producing two decades of biased mediation analyses in psychology and medicine before Pearl's framework corrected the field.

Why is the VanderWeele 4-way decomposition needed if Pearl's NDE + NIE = ATE already gives a complete decomposition?

Key Ideas

**NDE** (Natural Direct Effect) = $\mathbb{E}[Y(x, M(x')) - Y(x', M(x'))]$ - effect of $X$ with the mediator frozen at the control level. The direct path $X \to Y$
**NIE** (Natural Indirect Effect) = $\mathbb{E}[Y(x, M(x)) - Y(x, M(x'))]$ - effect only through the mediator with $X$ fixed at $x$. The path $X \to M \to Y$
**ATE = NDE + NIE** - a mathematical identity given counterfactual definitions. NDE and NIE can have opposite signs (inconsistent mediation)
**Pearl 2001 three identifiability conditions:** no unmeasured confounders for $X \to Y$, $X \to M$, $M \to Y$. The third is the hardest: it forbids an unobserved confounder $M \leftrightarrow Y$ that is itself a descendant of $X$
**VanderWeele 2014 4-way:** with $X \times M$ interaction, NDE splits into pure direct effect + interaction term. Critical for treatment personalization and effect modification
**ML applications:** fairness constraints via path-specific effects, RLHF mechanism via CoT mediator, uplift modeling, clinical surrogate endpoints - all require mediation, not just total effect

What Comes Next

Mediation analysis opens pathway analytics and advanced fairness methods:

Counterfactual semantics — Y(x, M(x')) requires full SCM semantics - the next lesson builds this mathematics
Transportability — Transporting NDE/NIE across populations - a direct extension of mediation identification
Double ML and CATE — CATE = total effect; adding mediation yields pathway-specific CATE

Вопросы для размышления

RLHF fine-tuning: NIE (through chain-of-thought) >> NDE (direct). How does this change the strategy for improving the model - is it better to invest in CoT quality or in alignment data directly?
The Baron-Kenny procedure from 1986 (regression with the mediator) produced biased results in the presence of M-Y confounders. Two decades of psychological and medical papers used it without correction. How can one distinguish valid mediation results from biased ones in the historical literature?
Path-specific fairness: a regulator requires 'zero direct discrimination' (NDE = 0) while allowing indirect inequality through occupation. Is this requirement sufficient, and under what DAG structures does it fail to achieve the stated goal?

Связанные уроки

cc-05-do-operator — do-operator and counterfactual notation - the language of NDE and NIE definitions
cc-06-do-calculus — The mediation formula is derived through sequential application of the three do-calculus rules
cc-07-identifiability — Identification of NDE and NIE requires the same no-unmeasured-confounders conditions as the ID-algorithm
cc-09-counterfactuals — Formal semantics of NDE/NIE is built on the counterfactual world Y(x, M(x'))
cc-12-double-ml-cate — Double ML and CATE estimate total effect; mediation adds pathway decomposition on top
stat-01-sampling