Causal Calculus

Identifiability: when a causal effect is unique

A correlation is significant on 5M examples with p-value $< 10^{-6}$. Regression yields a coefficient of $0.42$. Ready to ship? If a hidden confounder lurks in the DAG, the true answer might be $-0.2$ or $+1.0$. Observations cannot tell those worlds apart. Identifiability is the formal test that answers "is there anything meaningful to estimate at all" BEFORE any regression is run.

**Microsoft DoWhy:** the library starts every analysis with an ID-algorithm check. On FAIL it returns an explanation rather than an estimate; dozens of internal Microsoft teams build their causal pipelines on top of this guardrail.
**Stripe Radar:** fraud-model failures in production frequently come from observational correlations diverging from causal effects under distribution shift. An ID check during development screens out unidentifiable target metrics before release.
**Booking.com:** for UX experiments they publish point estimate, CI, and Manski bounds; the feature decision is frozen if bounds straddle zero, even when the p-value is significant.

Предварительные знания

Backdoor and frontdoor criteria (cc-03, cc-04)
$d$-separation and path blocking in a DAG (cc-02)
The three rules of do-calculus (cc-06)

Identifiability

Identifiability is the central question of causal inference: can $P(Y \mid \text{do}(X))$ be computed from the observational distribution $P(Y, X, Z, \ldots)$ without an actual intervention? If YES, the effect is identifiable and the answer is unique, independent of unmeasured confounders. If NO, the same observational distribution is consistent with different causal stories, and no quantitative conclusion about $\text{do}(X)$ can be drawn without an RCT.

The simplest counterexample: a confounder $U \to X$, $U \to Y$ where $U$ is unobserved. The conditional $P(Y \mid X)$ is always estimable from data, yet $P(Y \mid \text{do}(X))$ is not, because different distributions of $U$ produce different interventional answers under one and the same joint $P(Y, X)$. This is why correlation and causation can diverge dramatically here, even flipping sign (Simpson's paradox).

**ML application - drug efficacy:** an observational study shows positive correlation between drug intake and recovery. Without an identifiability check, there is no way to separate "drug works" from "healthier patients are more likely to consent to therapy" (selection bias through $U$ = baseline health). Microsoft DoWhy makes the identification check the very first step. If the effect is not identifiable, the library refuses to return a point estimate.

Identifiability is a property of the pair (DAG, observed distribution), not of the data alone. The same data table is identifiable under one structural model and non-identifiable under another. Justifying the DAG is part of validation, and that argument comes from domain knowledge, not from a statistical test.

What does it mean for the causal effect $P(Y \mid \text{do}(X))$ to be identifiable from observational data?

Tian-Pearl ID algorithm and the three criteria

Identifiability in a DAG is verified through known criteria. Three classical sufficient ones, plus one general complete procedure.

**Backdoor:** find a set $Z$ of observed variables blocking every backdoor path from $X$ to $Y$. Then $P(Y \mid \text{do}(X)) = \sum_z P(Y \mid X, Z=z) P(Z=z)$. **Frontdoor:** when backdoor is impossible due to a hidden confounder, route through a mediator $M$ that fully transmits the effect of $X$ on $Y$ and is itself unaffected by the hidden confounder. **Instrumental variable:** an external source of variation $Z$ influencing $Y$ only through $X$.

**Pearl do-calculus** is three transformation axioms for expressions involving $\text{do}(\cdot)$. Completeness (Shpitser, Pearl 2006): an effect is identifiable iff it is derivable by repeated application of the three rules. Consequence: if no formal derivation works, no clever parametrisation can rescue the situation. There simply is no identifiability.

**The Tian-Pearl ID algorithm (2002, 2006)** automates the check: input is a DAG and the pair $(X, Y)$, output is either a formula in terms of $P(\text{observed})$ or a proof of non-identifiability. The algorithm is complete: whatever it does not find does not exist.

**ML application - production:** Microsoft DoWhy and Stripe Radar use the ID algorithm to auto-select estimators. Input is a DAG from expert knowledge or causal discovery, output is either an identification formula or a refusal. Booking.com reports that their causal pipeline ALWAYS starts with the ID check, before any regression or ML model; the estimator is then chosen based on the resulting formula.

Given the DAG: $U \to X \to M \to Y$, $U \to Y$, where $U$ is unobserved and $M$ is observed. Which criterion applies?

Hedges, non-identifiability, and bounds

When does the ID algorithm refuse? Shpitser-Pearl (2006) gave a structural criterion: an effect is non-identifiable iff the DAG contains a **hedge** - a pair of C-components, one nested in the other, both containing a vertex of $X$. A hedge is a formal obstruction: no combination of the three do-calculus rules can express $P(Y \mid \text{do}(X))$ in observational terms.

The simplest hedge is a **bow arc**: $X \to Y$ together with a bidirected edge $X \leftrightarrow Y$ (a common hidden ancestor of $X$ and $Y$, neither reachable for conditioning). The observational $P(Y \mid X)$ is then consistent with a wide range of values for $P(Y \mid \text{do}(X))$.

What to do when the effect is non-identifiable? Do not give up. There are alternatives.

**Manski bounds (partial identification):** report a range $[L, U]$ that is guaranteed to contain the causal effect under any value of hidden parameters. Without strong assumptions the bounds may be wide, but they are correct, and frequently the width itself answers the business question ("even the upper bound is below the cost").

**Sensitivity analysis:** how strong must the hidden confounding be to flip the conclusion? E-value, Rosenbaum bounds, VanderWeele methods. If overturning the effect requires a confounder explaining $80\%$ of variance, the conclusion is robust. If $5\%$ suffices, it is fragile.

**Bayesian partial identification:** a prior on unobserved parameters plus observed data yields a posterior with a wide credible interval. The decision is then made with explicit uncertainty quantification.

**ML application - Booking.com:** for UX experiments they publish the point estimate ATE, a $95\%$ confidence interval, AND Manski bounds whenever identifiability is in question. If the bounds straddle zero, the decision is deferred to an RCT even when the point estimate is significant. This eliminates business failures caused by hidden confounding.

With a sufficiently large sample any causal effect can be estimated. Just collect more data and use a richer model.

Identifiability is a structural property of the DAG, not a statistical one. If a hedge is present, no amount of data and no ML model produces an unbiased estimate; bounds and sensitivity analysis are the correct response.

A causal effect is a functional of the interventional distribution, not the observational one. From observational data only a projection is accessible; recovering the full distribution requires structural restrictions on the DAG. Infinite samples eliminate variance but not bias, since bias is determined by the model, not by the data.

The ID algorithm returned FAIL: the effect is non-identifiable due to a hedge. The sample size is 5M and a conditional correlation has p-value $< 0.001$. What is the most correct next step?

Key ideas

**Identifiability** is the unambiguous computability of $P(Y \mid \text{do}(X))$ from the observational $P(\text{observed})$ given DAG structure; a property of the model, not of the data.
**Three criteria** give sufficient conditions: backdoor (an observed blocking set), frontdoor (via a mediator), instrumental variable (external variation). **Do-calculus** is the complete axiomatic basis, and the Tian-Pearl ID algorithm is its algorithmic incarnation.
**A hedge** is a structural obstruction: a pair of nested C-components that makes the effect impossible to compute in principle. No sample size and no model removes the bias.
**When non-identifiable** - Manski bounds, sensitivity analysis, and Bayesian partial identification provide an honest range instead of a misleading point estimate.

Вопросы для размышления

A working dataset has $5{,}000{,}000$ records and a regression returns $\hat{\beta} = 0.42$ with a tight CI. Which steps must precede interpreting $\hat{\beta}$ as a causal effect?
Why is identifiability a property of the pair (DAG, distribution) rather than of the data alone? What follows for validating causal models in production?
When is it more rational to publish Manski bounds instead of a point estimate, and how should such a result be communicated to a stakeholder used to seeing a single number?

Связанные уроки

cc-03-backdoor — Backdoor criterion is the first and most basic sufficient condition for identifiability
cc-04-frontdoor — Frontdoor criterion delivers identification where backdoor fails due to a hidden confounder
cc-06-do-calculus — The three rules of do-calculus form a complete axiomatic basis for DAG identifiability
cc-08-mediation — Identification of NDE and NIE is a special case of conditional identifiability (IDC)
cc-11-causal-discovery — The ID algorithm assumes a known DAG; causal discovery recovers structure from data
stat-01-sampling

Causal Calculus

Identifiability: when a causal effect is unique

**Microsoft DoWhy:** the library starts every analysis with an ID-algorithm check. On FAIL it returns an explanation rather than an estimate; dozens of internal Microsoft teams build their causal pipelines on top of this guardrail.
**Stripe Radar:** fraud-model failures in production frequently come from observational correlations diverging from causal effects under distribution shift. An ID check during development screens out unidentifiable target metrics before release.
**Booking.com:** for UX experiments they publish point estimate, CI, and Manski bounds; the feature decision is frozen if bounds straddle zero, even when the p-value is significant.

Предварительные знания

Backdoor and frontdoor criteria (cc-03, cc-04)
$d$-separation and path blocking in a DAG (cc-02)
The three rules of do-calculus (cc-06)

Identifiability

What does it mean for the causal effect $P(Y \mid \text{do}(X))$ to be identifiable from observational data?

Tian-Pearl ID algorithm and the three criteria

Identifiability in a DAG is verified through known criteria. Three classical sufficient ones, plus one general complete procedure.

Given the DAG: $U \to X \to M \to Y$, $U \to Y$, where $U$ is unobserved and $M$ is observed. Which criterion applies?

Hedges, non-identifiability, and bounds

What to do when the effect is non-identifiable? Do not give up. There are alternatives.

With a sufficiently large sample any causal effect can be estimated. Just collect more data and use a richer model.

The ID algorithm returned FAIL: the effect is non-identifiable due to a hedge. The sample size is 5M and a conditional correlation has p-value $< 0.001$. What is the most correct next step?

Key ideas

**Identifiability** is the unambiguous computability of $P(Y \mid \text{do}(X))$ from the observational $P(\text{observed})$ given DAG structure; a property of the model, not of the data.
**Three criteria** give sufficient conditions: backdoor (an observed blocking set), frontdoor (via a mediator), instrumental variable (external variation). **Do-calculus** is the complete axiomatic basis, and the Tian-Pearl ID algorithm is its algorithmic incarnation.
**A hedge** is a structural obstruction: a pair of nested C-components that makes the effect impossible to compute in principle. No sample size and no model removes the bias.
**When non-identifiable** - Manski bounds, sensitivity analysis, and Bayesian partial identification provide an honest range instead of a misleading point estimate.

Вопросы для размышления

A working dataset has $5{,}000{,}000$ records and a regression returns $\hat{\beta} = 0.42$ with a tight CI. Which steps must precede interpreting $\hat{\beta}$ as a causal effect?
Why is identifiability a property of the pair (DAG, distribution) rather than of the data alone? What follows for validating causal models in production?
When is it more rational to publish Manski bounds instead of a point estimate, and how should such a result be communicated to a stakeholder used to seeing a single number?

Связанные уроки

cc-03-backdoor — Backdoor criterion is the first and most basic sufficient condition for identifiability
cc-04-frontdoor — Frontdoor criterion delivers identification where backdoor fails due to a hidden confounder
cc-06-do-calculus — The three rules of do-calculus form a complete axiomatic basis for DAG identifiability
cc-08-mediation — Identification of NDE and NIE is a special case of conditional identifiability (IDC)
cc-11-causal-discovery — The ID algorithm assumes a known DAG; causal discovery recovers structure from data
stat-01-sampling

Identifiability: when a causal effect is unique

Предварительные знания

Identifiability

Tian-Pearl ID algorithm and the three criteria

Hedges, non-identifiability, and bounds

Key ideas

Related topics

Вопросы для размышления

Связанные уроки

Identifiability: when a causal effect is unique

Предварительные знания

Identifiability

Tian-Pearl ID algorithm and the three criteria

Hedges, non-identifiability, and bounds

Key ideas

Related topics

Вопросы для размышления

Связанные уроки