Causal Calculus
Synthetic Control Method
How can the effect of a unique policy intervention be estimated when the counterfactual - what would have happened without the policy - is never observed?
- **Health policy:** Proposition 99 in California - synthetic control showed a drop of 25 packs per capita annually relative to the synthetic counterfactual
- **Macroeconomics:** estimating the economic cost of the Basque Country terrorism through a synthetic Spain without conflict
- **Finance:** evaluating the impact of euro adoption on GDP of individual countries via synthetic counterfactuals from non-adopters
- **Technology policy:** estimating the effect of regional TikTok bans through synthetic control built from unaffected markets
Предварительные знания
- Potential outcomes framework (PO)
- Linear algebra: norms and projections
- Constrained optimization basics
The synthetic control method (Abadie, Diamond, Hainmueller, 2010) addresses causal inference for aggregate units - countries, regions, firms - where traditional methods like DiD require many comparable units. Synthetic control works with a single treated unit and a small donor pool, constructing the counterfactual as a convex combination of donors.
Synthetic control is preferable to DiD when: (1) treatment is received by a single aggregate unit, (2) the pre-treatment period is long enough to calibrate weights, (3) the parallel trends assumption is questionable. Pre-treatment fit quality (R-squared close to 1) is directly verifiable from the data.
Synthetic Control Method
In 1998 California passed Proposition 99, the first major US tobacco tax. Abadie and Gardeazabal (2003) used synthetic control to quantify the causal effect: per-capita cigarette consumption dropped 25 packs relative to a synthetic California built as a weighted combination of other states. The method addresses a fundamental problem - for an aggregate treated unit (a state, country, firm) the counterfactual is never observed, and classical DiD or regression need many comparable units.
Key assumption: the synthetic unit closely tracks the treated unit during pre-treatment t <= T0. Unlike the parallel-trends assumption in DiD, this is directly verifiable from data via pre-treatment R-squared.
The method shines when treatment affects a single aggregate unit, when the pre-period is long enough to calibrate weights, and when parallel trends are doubtful. Nuclear-norm regularization in matrix completion extends the framework to multiple treated units and partial observations.
Why are the constraints w >= 0 and sum(w) = 1 essential to the method?
Donor Pool Selection
Quality of the synthetic control hinges on the donor pool. Ideal donors are units similar to the treated unit in pre-treatment characteristics and untouched by closely related shocks. Including too heterogeneous donors leads to sparse weights and overfitting; too narrow a pool cannot achieve good fit.
Extrapolation bias: if the treated unit lies outside the convex hull of donors in covariate space, the synthetic unit cannot reproduce it under w >= 0. Remedies are widening the pool, allowing extrapolation (synthetic DiD), or declaring the method inapplicable.
Abadie's practical rule: exclude units that received similar treatment in the pre or post period; units hit by strong idiosyncratic shocks (natural disasters); and units with a structurally different economic regime.
Which diagnostic signals an extrapolation bias problem?
Inference and Robustness
Statistical significance in synthetic control is assessed through placebo tests rather than classical standard errors. The idea is to apply the method to each untreated donor as a pseudo-treated unit. The resulting placebo effects form a null distribution against which the true effect is compared.
A robustness battery includes leave-one-out over donors, varying pre-period length, changing covariate sets, and placebo-in-time tests on pre-treatment data. Only the full set of checks supports a causal interpretation.
What does leave-one-out analysis reveal in synthetic control?
Connections to other causal methods
Synthetic control extends and complements classical policy evaluation methods for aggregate data.
- Difference-in-Differences — Related topic
- Regression Discontinuity Design — Related topic
- Matching Methods — Related topic
- Matrix Completion — Related topic
Итоги
- Synthetic control builds the counterfactual as a convex combination of donors: w >= 0, sum(w) = 1
- Weights are optimized to match pre-treatment covariates and outcome trajectories
- Effect tau_t = Y_1t - sum(w_j * Y_jt) is estimated for each post-treatment period separately
- Statistical inference via placebo tests: apply the method to each donor in turn
- Pre-treatment R-squared close to 1 is a necessary diagnostic for method validity
- Limitation: requires sufficient donor pool and long pre-treatment period for reliable weight calibration
What is the purpose of placebo tests in synthetic control?
Placebo tests apply synthetic control to each donor unit as if it were treated. If the treated unit's post/pre RMSPE ratio is unusually large relative to the distribution of donor ratios, this provides evidence of a true causal effect.