library(Synth)
library(SCtools)
library(tidyverse)
library(skimr)
Basic synthetic control tutorial
Video overview
AI summary
The lecture discusses synthetic controls, a method of causal inference for analyzing policy changes when only one unit is treated. A data-driven counterfactual is constructed using an algorithm. Inference is challenging due to the lack of a conventional control group.
00:00 Synthetic controls is a method of causal inference for policy changes
05:39 Synthetic control method matches pre-trends to estimate treatment effect
10:38 Synthetic control studies assume treated units remain treated and spillovers are excluded.
15:38 Weights in synthetic controls are determined by an algorithm
20:40 Synthetic controls is a standard minimization procedure used for causal inference.
25:12 Synthetic control method matches characteristics of Basque country well
29:47 Synthetic control studies require careful consideration of methodological refinements.
01:21 Conventional standard errors don’t work for synthetic controls
Detailed Summary for Causal Inference – 22/23 – Synthetic Control I by Merlin
00:00 Synthetic controls is a method of causal inference for policy changes
- Synthetic controls is used when only one unit has been treated
- The method constructs a data-driven counterfactual using an algorithm
05:39 Synthetic control method matches pre-trends to estimate treatment effect
- Weighted average of control units chosen by algorithm
- Assumption of parallel trends in absence of treatment
10:38 Synthetic control studies assume treated units remain treated and spillovers are excluded.
- Counterfactuals are weighted averages of actual outcomes of units in the donor pool.
- Treatment effects are estimated by comparing actual outcomes of treated units to synthetic controls.
15:38 Weights in synthetic controls are determined by an algorithm
- Characteristics of treated unit and control units are matched based on multiple variables
- Vector of weights is found to minimize distance between target variables
20:40 Synthetic controls is a standard minimization procedure used for causal inference.
- Weighted variables are counted as separate variables.
- Simple minimization algorithm typically matches pre-trends well.
25:12 Synthetic control method matches characteristics of Basque country well
- Synthetic control algorithm finds weights based on variables like GDP and education
- Weight only rests on two regions: Catalonia and Madrid
29:47 Synthetic control studies require careful consideration of methodological refinements.
- Matching on pre-treatment outcomes is crucial for finding a good match.
- Statistical inference in synthetic control studies is challenging and requires careful consideration.
01:21 Conventional standard errors don’t work for synthetic controls
- Synthetic controls have high serial correlation and only two clusters
- Non-parametric methods like placebo synthetic controls are needed for meaningful standard errors
AI summary
The lecture discusses the difficulty in drawing inference in synthetic control studies and suggests using non-parametric methods based on permutation tests. An example of the tobacco control program in California is used to illustrate this.
00:00 Non-parametric methods based on permutation tests are used for inference in synthetic control studies.
05:50 Synthetic California is a meaningful counterfactual
11:16 Graphs show the difference between treated and synthetic control unit.
16:18 California is a clear outlier in placebo estimates
01:26 When running a synthetic control study, it’s important to add one to each numerator and denominator to avoid false statistical significance.
01:06 Brexit vote caused UK GDP to fall below its doppelganger
31:08 Synthetic control method is a powerful tool for policy evaluation
36:10 Synthetic control meets machine learning through matrix completion methods.
Detailed Summary for Causal Inference – 23/23 – Synthetic Control II by Merlin
00:00 Non-parametric methods based on permutation tests are used for inference in synthetic control studies.
- Synthetic control studies compare the time series of the outcome of the treated unit to that of the synthetic control unit.
- Matching is done based on a restricted donor pool and a few pre-periods and cross-sectional variables.
05:50 Synthetic California is a meaningful counterfactual
- Synthetic California has a small difference compared to actual California
- Matching on a pre-period considerably before treatment kicks in provides a plausible research design
11:16 Graphs show the difference between treated and synthetic control unit.
- 38 states were used to estimate synthetic controls and treatment effects.
- Observations with huge pre-treatment differences were trimmed.
16:18 California is a clear outlier in placebo estimates
- Empirical p-value is computed by counting how many placebo estimates are greater than California’s
- Consideration of values in opposite direction makes it equivalent to a two-sided test
01:26 When running a synthetic control study, it’s important to add one to each numerator and denominator to avoid false statistical significance.
- Adding one to each numerator and denominator gives a more accurate p-value.
- Many papers get this wrong, so it’s important to be aware of this issue.
01:06 Brexit vote caused UK GDP to fall below its doppelganger - Using placebo, timing of Brexit vote was analyzed - Robustness checks and expectation augmented VAR model were used for analysis
31:08 Synthetic control method is a powerful tool for policy evaluation
- Synthetic control method compares treated unit with a weighted average of control units
- Results are robust to different donor pool restrictions and can complement other methods
36:10 Synthetic control meets machine learning through matrix completion methods.
- Abadi and Jeremy Luer devised a method to choose between multiple optimal synthetic controls using machine learning.
- Methodological refinements include dealing with imperfect pre-treatment matches and covariate imbalance, with augmented estimators providing bounds on estimates.
Video source: Ben Elsner
This note is based on the article “Synth: An R Package for Synthetic Control Methods in Comparative Case Studies” by Alberto Abadie, Alexis Diamond, and Jens Hainmueller, which introduces the Synth package in R. This package implements synthetic control methods used in comparative case studies. These methods construct a synthetic control unit as weighted average of available control units that best approximate the relevant characteristics of the treated unit prior to the treatment. The Synth package includes functions such as synth()
, which constructs the synthetic control unit, and dataprep()
, which organizes the data. Other functions like synth.tables()
, path.plot()
, and gaps.plot()
help in summarizing and illustrating the results. The authors demonstrate the use of these functions with an example from a study on the economic effects of conflict in the Basque Country using other Spanish regions as potential control units. They conclude by discussing future extensions to the Synth package
Introduction
Introduction to Social Science Research Methods
- Social science often explores causal questions about the impacts of historical events and policy interventions on aggregate units like cities, regions, and countries.
- Comparative case studies are a common method for answering these questions, comparing outcomes for affected units (treated group) with unaffected units (control group).
Limitations of Traditional Comparative Case Studies
- Traditional methods leave control unit selection to the analyst, raising questions about selection arbitrariness and the credibility of control units as proxies for treated units’ counterfactual outcomes.
Introduction to Synthetic Control Methods
- Synthetic control methods, introduced by Abadie and Gardeazabal (2003) and Abadie et al. (2010), address these issues by proposing a data-driven control-group selection procedure, a framework for assessing the chosen control group’s suitability, and a means of producing quantitative inference.
- A synthetic control unit is defined as a weighted average of available control units that approximates the treated unit’s relevant characteristics prior to treatment.
- The synthetic control method guards against extrapolation outside the convex hull of the data, as weights from all control units can be chosen to be positive and sum to one.
The Synth Package in R
- The Synth package in R implements synthetic control methods, with the central function being
synth()
, which constructs the synthetic control unit by solving an optimization problem to identify a set of weights assigned to potential control units. - Other functions like
dataprep()
,synth.tables()
,path.plot()
, andgaps.plot()
help organize data, summarize results, and create visual representations.
Case Study: The Basque Country
- The data example from Abadie and Gardeazabal (2003) uses synthetic control methods to investigate the effects of the terrorist conflict in the Basque Country on the Basque economy, using other Spanish regions as potential control units.
Synthetic control methods
Synthetic Control Methods Overview
- Synthetic control methods involve creating synthetic control units from multiple control units. The weights defining the synthetic control unit are chosen to best approximate the characteristics of the treated unit during the pre-treatment period.
- The synthetic control method is used to estimate the outcomes that would have been observed for the treated unit in the absence of the intervention.
Theoretical Properties
- The method is based on an econometric model that generalizes the usual difference-in-differences model commonly applied in empirical literature.
Construction of Synthetic Control Units
- The synthetic control unit is constructed from a set of control units, known as the donor pool. The intervention occurs at a specific time period, dividing the timeline into pre-intervention and post-intervention periods.
- Two potential outcomes are defined: the outcome if the unit is not exposed to the intervention (denoted as \(Y_{i t}^{N}\)), and the outcome if the unit is exposed to the intervention (denoted as \(Y_{i t}^{I}\)). The goal is to estimate the effect of the intervention on the outcome for the treated unit in the post-intervention period, formally defined as the difference between the two potential outcomes \(\alpha_{1 t}=Y_{1 t}^{I}-Y_{1 t}^{N}\) for periods \(T_{0}+1, T_{0}+2, \ldots, T\).
Estimation of Missing Potential Outcome
- The synthetic control method aims to construct a synthetic control group that provides a reasonable estimate for the missing potential outcome \(Y_{i t}^{N}\) for the treated unit in the post-intervention period.
Weight Selection
- The synthetic control unit is defined by a vector of weights, each representing a particular weighted average of control units. The weights are chosen to best approximate the unit exposed to the intervention with respect to the outcome predictors and linear combinations of pre-intervention outcomes.
- In empirical applications, the weights are chosen so that the identity conditions hold approximately. The user can easily check how similar a particular synthetic control unit is to the treated unit.
Numerical Implementation
- The synth() function is used to numerically implement the synthetic control estimator. It defines a distance between the synthetic control unit and the treated unit, and chooses the vector of weights to minimize this distance.
- The synth() function allows for flexibility in the choice of the matrix V, which allows different weights to the variables depending on their predictive power on the outcome. An optimal choice of V minimizes the mean square error of the synthetic control estimator.
Inferential Techniques
- Synthetic control methods facilitate inferential techniques akin to permutation tests that are well-suited to comparative case studies in which the number of units in the comparison group and the number of periods in the sample are relatively small.
- The method proposes conducting placebo studies by iteratively applying the synthetic control method by randomly reassigning the intervention in time or across units to produce a set of placebo effects. These effects are then compared to the effect estimated for the actual intervention.
- The placebo tests are akin to permutation inference, where a test statistic is iteratively computed under random permutations of the assignment vector that determines whether a unit is in the treatment or the control group.
Implementation of the Synth
package
- Abadie and Gardeazabal (2003) study the economic impacts of conflict in the Basque Country.
- A synthetic Basque Country was constructed using a combination of other Spanish regions, which closely resembled the Basque Country’s economic characteristics before the onset of political terrorism in the 1970s.
- The data, spanning from 1955 to 1997, covers 17 Spanish regions, excluding the small autonomous towns of Ceuta and Melilla on the African coast.
- The data includes per-capita GDP (the outcome variable), as well as population density, sectoral production, investment, and human capital (the predictor variables).
- Missing data in the dataset are denoted by NA.
data("basque")
85:89, 1:4] basque[
regionno regionname year gdpcap
85 2 Andalucia 1996 5.995930
86 2 Andalucia 1997 6.300986
87 3 Aragon 1955 2.288775
88 3 Aragon 1956 2.445159
89 3 Aragon 1957 2.603399
head(basque)
regionno regionname year gdpcap sec.agriculture sec.energy sec.industry
1 1 Spain (Espana) 1955 2.354542 NA NA NA
2 1 Spain (Espana) 1956 2.480149 NA NA NA
3 1 Spain (Espana) 1957 2.603613 NA NA NA
4 1 Spain (Espana) 1958 2.637104 NA NA NA
5 1 Spain (Espana) 1959 2.669880 NA NA NA
6 1 Spain (Espana) 1960 2.869966 NA NA NA
sec.construction sec.services.venta sec.services.nonventa school.illit
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
school.prim school.med school.high school.post.high popdens invest
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 NA NA NA NA NA NA
5 NA NA NA NA NA NA
6 NA NA NA NA NA NA
glimpse(basque)
Rows: 774
Columns: 17
$ regionno <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ regionname <chr> "Spain (Espana)", "Spain (Espana)", "Spain (Espa…
$ year <dbl> 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, …
$ gdpcap <dbl> 2.354542, 2.480149, 2.603613, 2.637104, 2.669880…
$ sec.agriculture <dbl> NA, NA, NA, NA, NA, NA, 19.54, NA, 19.05, NA, 16…
$ sec.energy <dbl> NA, NA, NA, NA, NA, NA, 4.71, NA, 4.31, NA, 4.31…
$ sec.industry <dbl> NA, NA, NA, NA, NA, NA, 26.42, NA, 26.05, NA, 27…
$ sec.construction <dbl> NA, NA, NA, NA, NA, NA, 6.27, NA, 6.83, NA, 7.64…
$ sec.services.venta <dbl> NA, NA, NA, NA, NA, NA, 36.62, NA, 38.00, NA, 38…
$ sec.services.nonventa <dbl> NA, NA, NA, NA, NA, NA, 6.44, NA, 5.77, NA, 6.48…
$ school.illit <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 2863.278, 28…
$ school.prim <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 18679.10, 18…
$ school.med <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1064.246, 11…
$ school.high <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 359.7457, 37…
$ school.post.high <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 212.1434, 21…
$ popdens <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ invest <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 18.36018, 20…
skim(basque)
Name | basque |
Number of rows | 774 |
Number of columns | 17 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
regionname | 0 | 1 | 6 | 28 | 0 | 18 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
regionno | 0 | 1.00 | 9.50 | 5.19 | 1.00 | 5.00 | 9.50 | 14.00 | 18.00 | ▇▆▇▆▇ |
year | 0 | 1.00 | 1976.00 | 12.42 | 1955.00 | 1965.00 | 1976.00 | 1987.00 | 1997.00 | ▇▇▇▇▇ |
gdpcap | 0 | 1.00 | 5.39 | 2.24 | 1.24 | 3.69 | 5.34 | 6.87 | 12.35 | ▅▇▇▂▁ |
sec.agriculture | 684 | 0.12 | 20.27 | 10.38 | 1.32 | 13.54 | 19.24 | 27.48 | 46.50 | ▅▇▇▅▂ |
sec.energy | 684 | 0.12 | 5.19 | 4.04 | 1.60 | 2.70 | 3.67 | 6.08 | 21.36 | ▇▂▁▁▁ |
sec.industry | 684 | 0.12 | 23.92 | 9.28 | 9.56 | 17.81 | 23.14 | 27.48 | 46.22 | ▅▇▆▂▂ |
sec.construction | 684 | 0.12 | 7.21 | 1.36 | 4.34 | 6.24 | 7.13 | 8.18 | 11.28 | ▃▇▆▅▁ |
sec.services.venta | 684 | 0.12 | 36.49 | 7.26 | 26.23 | 31.25 | 34.75 | 39.19 | 58.21 | ▇▇▃▁▂ |
sec.services.nonventa | 684 | 0.12 | 6.93 | 1.98 | 3.43 | 5.50 | 6.68 | 7.93 | 13.11 | ▃▇▃▂▁ |
school.illit | 666 | 0.14 | 308.05 | 630.84 | 8.10 | 40.29 | 116.23 | 252.27 | 2863.28 | ▇▁▁▁▁ |
school.prim | 666 | 0.14 | 2118.52 | 4216.78 | 151.32 | 432.29 | 852.13 | 1763.31 | 19459.56 | ▇▁▁▁▁ |
school.med | 666 | 0.14 | 145.61 | 297.45 | 8.61 | 26.51 | 47.75 | 119.04 | 1696.15 | ▇▁▁▁▁ |
school.high | 666 | 0.14 | 45.94 | 92.11 | 3.06 | 9.13 | 16.70 | 38.76 | 474.94 | ▇▁▁▁▁ |
school.post.high | 666 | 0.14 | 25.46 | 51.58 | 1.66 | 4.41 | 7.71 | 19.06 | 252.25 | ▇▁▁▁▁ |
popdens | 756 | 0.02 | 105.77 | 101.52 | 22.38 | 44.77 | 80.38 | 122.57 | 442.45 | ▇▂▁▁▁ |
invest | 198 | 0.74 | 21.40 | 4.11 | 9.33 | 18.74 | 21.35 | 23.75 | 39.41 | ▁▇▇▁▁ |
In Abadie and Gardeazabal (2003), there are 16 control regions and the 13 predictor variables:
1964-1969 averages for gross total investment/GDP (invest).
1964-1969 averages for the share of education:
- 1964-1969 averages for the share of the working-age population that was illiterate (school.illit)
- 1964-1969 averages for the share of the working-age population with up to primary school education (school.prim)
- 1964-1969 averages for the share of the working-age population with some high school (school.med)
- 1964-1969 averages for the share of the working-age population wit high school (school.high)
1961-1969 averages for six industrial-sector shares as a percentage of total production (these variables are named with a sec. prefix). These variables are available on a biennial basis \((1961,1963, \ldots, 1969)\)
1960-1969 averages for real GDP per-capita (gdpcap) measured in thousands of 1986 USD.
1969 population density measured in persons per square kilometer (popdens).
Using dataprep()
At a minimum, synth()
requires as inputs the four data matrices: \(X_{1}, X_{0}, Z_{1}\), and \(Z_{0}\). In our example, these four data matrices are:
- \(X_{1}\) is the \((13 \times 1)\) vector of Basque region predictors
- \(X_{0}\) is the \((13 \times 16)\) matrix of values of the same variables for the 16 control regions. Note that all but one of these predictors is an average value over some range of the pre-treatment period, and the precise date-range varies across predictor variables.
- \(Z_{1}\) is a \((10 \times 1)\) vector which contain the values for the outcome variable for the Basque country
- \(Z_{0}\) is a \((10 \times 16)\) matrix which contain the values for the outcome variable for the control units for the 10 pre-intervention periods over which we want to minimize the MSPE.
It is strongly recommended to use dataprep()
to extract and package the inputs for synth()
in a single list object. This list object is also used by other functions such as synth.tables()
, path.plot()
, and gaps.plot()
to produce tables and figures that illustrate the results
- To obtain \(X_{1}\) and \(X_{0}\) the user must define the predictor variables, the operator (e.g., mean), and time-period (e.g., 1964:1969) applied to these variables.
- The user must specify the dependent variable (e.g., gdpcap), the variable(s) identifying unit names (e.g., regionname) and/or numbers (e.g., regionno), the variable identifying time-periods (e.g., year), the treated unit (e.g., region number 17 which is the Basque country), the control units (e.g., regions number \(c(2: 16,18))\), the time-period over which to optimize (e.g., the pre-treatment period 1960:1969). This refers to \(Z_{1}\) and \(Z_{0}\) accordingly.
- The user should also specify time-period over which outcome data should be plotted (usually before and after treatment, e.g., \(1955:1997\) )
How to use the function dataprep()
- Some of the predictor information is given by the arguments
predictors
,predictors.op
, andtime.predictors.prior
.
- The rest of the information for the other predictors is specified in the
special.predictors list
.
This separation allow for easy handling of several predictors with the same operator over the same pre-treatment period (in this case, the school and investment variables) as well as additional custom (or “special”) predictors with heterogeneous operators and time-periods. For example, the variables for the sector production shares (with the sec prefix) are only available on a biennial basis \((1961,1963, \ldots, 1969)\) extracted via seq \((1961,1969,2)\). Averaging over the available years is easily accommodated using the special.predictors list
. For more details and examples check the help file of dataprep().
# Prepare the data for analysis
<- dataprep(
dataprep.out foo = basque, # the dataset to be prepared
# the predictor variables to be used in the model
predictors = c("school.illit", "school.prim", "school.med",
"school.high", "school.post.high", "invest"),
predictors.op = "mean", # operation to be applied on the predictors
time.predictors.prior = 1964:1969, # time period for the predictors
# special predictors with their respective time periods and operations
special.predictors = list(
list("gdpcap", 1960:1969, "mean"),
list("sec.agriculture", seq(1961,1969,2), "mean"),
list("sec.energy", seq(1961,1969,2), "mean"),
list("sec.industry", seq(1961,1969,2), "mean"),
list("sec.construction", seq(1961,1969,2), "mean"),
list("sec.services.venta", seq(1961,1969,2), "mean"),
list("sec.services.nonventa", seq(1961,1969,2), "mean"),
list("popdens", 1969, "mean")
),
dependent = "gdpcap", # the dependent variable
unit.variable = "regionno", # the variable representing the unit of observation
unit.names.variable = "regionname", # the variable representing the names of the units
time.variable = "year", # the variable representing the time period
treatment.identifier = 17, # the identifier for the treatment group
controls.identifier = c(2:16,18), # the identifiers for the control groups
time.optimize.ssr = 1960:1969, # the time period over which the sum of squared residuals (SSR) is minimized to estimate the weights
time.plot = 1955:1997 # the time period for the plot
)
We can confirm the contents of the four data matrices:
- \(X_1\) contains the predictors of the treatment unit (region 17).
- Education-related predictors are averaged over the 1964-1969 period.
- GPDpc averaged over the 1960-1969 period is also included as a predictor.
- Sector-related predictors are averaged over the 1961-1969 period (on a biannual basis).
- Population density is only available for the year 1969.
$X1 dataprep.out
17
school.illit 39.888465
school.prim 1031.742299
school.med 90.358668
school.high 25.727525
school.post.high 13.479720
invest 24.647383
special.gdpcap.1960.1969 5.285468
special.sec.agriculture.1961.1969 6.844000
special.sec.energy.1961.1969 4.106000
special.sec.industry.1961.1969 45.082000
special.sec.construction.1961.1969 6.150000
special.sec.services.venta.1961.1969 33.754000
special.sec.services.nonventa.1961.1969 4.072000
special.popdens.1969 246.889999
Notice that dataprep.out
appends the associated date-range only to the names of the special.predictors
.
In \(X_0\): - Region 1 is not included because it is national value - Region 17 is not included because it is the treatment unit.
$X0 dataprep.out
2 3 4
school.illit 863.389160 73.121226 31.488423
school.prim 3062.424886 728.578929 670.909393
school.med 155.565318 44.215389 46.398482
school.high 57.266496 16.091676 14.799358
school.post.high 27.278924 8.684416 6.424505
invest 19.320031 21.577486 22.769643
special.gdpcap.1960.1969 2.560747 3.699907 3.733876
special.sec.agriculture.1961.1969 24.194000 21.726000 12.362000
special.sec.energy.1961.1969 2.774000 6.278000 18.648000
special.sec.industry.1961.1969 18.276000 22.780000 24.126000
special.sec.construction.1961.1969 8.130000 7.832000 9.006000
special.sec.services.venta.1961.1969 38.186000 34.289999 30.152000
special.sec.services.nonventa.1961.1969 8.444000 7.096000 5.708000
special.popdens.1969 68.510002 24.040001 98.739998
5 6 7
school.illit 47.903906 128.308287 9.394911
school.prim 300.813619 522.094955 289.571732
school.med 20.045204 45.447489 24.106414
school.high 5.921604 12.086849 7.304420
school.post.high 3.680154 5.844122 2.885214
invest 24.441712 25.954247 29.071211
special.gdpcap.1960.1969 5.215974 3.051014 3.871173
special.sec.agriculture.1961.1969 13.130000 19.944000 15.922000
special.sec.energy.1961.1969 2.076000 7.818000 2.894000
special.sec.industry.1961.1969 18.258000 9.816000 36.530000
special.sec.construction.1961.1969 8.294000 8.670000 5.976000
special.sec.services.venta.1961.1969 51.752000 45.278001 33.484000
special.sec.services.nonventa.1961.1969 6.494000 8.482000 5.198000
special.popdens.1969 104.169998 148.250000 87.389999
8 9 10
school.illit 105.508144 254.449707 277.935237
school.prim 1766.928691 1035.269368 2883.361735
school.med 102.299929 33.369543 215.165476
school.high 37.787317 16.539727 63.608315
school.post.high 19.173427 6.308165 32.374611
invest 19.652509 17.970690 22.520380
special.gdpcap.1960.1969 2.807062 2.240688 5.147008
special.sec.agriculture.1961.1969 29.718000 36.086001 6.936000
special.sec.energy.1961.1969 7.818000 5.180000 2.880000
special.sec.industry.1961.1969 16.474000 17.178000 40.056000
special.sec.construction.1961.1969 7.144000 5.810000 6.706000
special.sec.services.venta.1961.1969 30.844000 29.238000 39.068000
special.sec.services.nonventa.1961.1969 8.000000 6.506000 4.356000
special.popdens.1969 28.770000 22.379999 153.119995
11 12 13
school.illit 266.967601 177.727966 234.752001
school.prim 1695.117126 692.888550 1666.930420
school.med 103.959094 24.048450 77.464478
school.high 34.275772 11.271676 29.341941
school.post.high 16.822106 4.570566 13.543060
invest 23.702696 20.239484 20.606269
special.gdpcap.1960.1969 3.843245 1.926478 2.516288
special.sec.agriculture.1961.1969 18.582000 37.948000 28.862000
special.sec.energy.1961.1969 2.698000 2.646000 5.328000
special.sec.industry.1961.1969 28.046000 10.886000 17.882000
special.sec.construction.1961.1969 6.170000 8.770000 7.032000
special.sec.services.venta.1961.1969 39.064001 31.720000 33.714001
special.sec.services.nonventa.1961.1969 5.444000 8.038000 7.188000
special.popdens.1969 128.699997 28.969999 91.019997
14 15 16
school.illit 133.158962 105.877481 13.438050
school.prim 1856.074280 431.746806 280.002396
school.med 269.961647 27.907434 20.450936
school.high 62.458658 9.082540 6.550353
school.post.high 57.704985 4.428528 4.190368
invest 16.234543 20.706271 19.955159
special.gdpcap.1960.1969 5.976692 2.899557 3.996087
special.sec.agriculture.1961.1969 1.864000 19.352000 24.704000
special.sec.energy.1961.1969 2.074000 10.228000 2.740000
special.sec.industry.1961.1969 23.834000 19.644000 28.398000
special.sec.construction.1961.1969 8.358000 6.290000 6.982000
special.sec.services.venta.1961.1969 52.714000 36.177999 30.468000
special.sec.services.nonventa.1961.1969 11.162000 8.312000 6.710000
special.popdens.1969 442.450012 73.360001 44.169998
18
school.illit 9.151832
school.prim 152.269465
school.med 9.760035
school.high 3.377170
school.post.high 1.732763
invest 18.055932
special.gdpcap.1960.1969 3.809205
special.sec.agriculture.1961.1969 30.322001
special.sec.energy.1961.1969 2.888000
special.sec.industry.1961.1969 26.612000
special.sec.construction.1961.1969 5.238000
special.sec.services.venta.1961.1969 28.300000
special.sec.services.nonventa.1961.1969 6.644000
special.popdens.1969 46.580002
- \(Z_1\) contains the outcome values of the treatment unit BEFORE treatment (that is 1960-1969).
$Z1 dataprep.out
17
1960 4.285918
1961 4.574336
1962 4.898957
1963 5.197015
1964 5.338903
1965 5.465153
1966 5.545916
1967 5.614896
1968 5.852185
1969 6.081405
- \(Z_0\) contains the outcome values of the control units BEFORE treatment (that is 1960-1969).
$Z0 dataprep.out
2 3 4 5 6 7 8 9
1960 2.010140 2.881462 2.967295 4.058841 2.357684 3.137032 2.138817 1.667524
1961 2.129177 3.099543 3.143887 4.360254 2.445730 3.327621 2.239503 1.752428
1962 2.280348 3.359183 3.373536 4.646173 2.648243 3.555341 2.454227 1.920451
1963 2.431020 3.614182 3.597258 4.911525 2.844759 3.771423 2.672237 2.091902
1964 2.508855 3.680091 3.672594 5.050700 2.951157 3.839403 2.777778 2.182591
1965 2.584690 3.745287 3.743359 5.184662 3.054199 3.906098 2.882176 2.274707
1966 2.694444 3.883319 3.909383 5.466795 3.231791 4.032133 2.988075 2.378392
1967 2.802342 4.016138 4.073122 5.737646 3.403385 4.155955 3.094544 2.482362
1968 2.987361 4.243645 4.308626 6.161454 3.660312 4.375893 3.302271 2.709083
1969 3.179092 4.476221 4.549700 6.581691 3.912882 4.610826 3.520994 2.947444
10 11 12 13 14 15 16 18
1960 4.241788 3.219294 1.535847 1.983290 5.161097 2.118609 3.163525 2.969866
1961 4.575335 3.362468 1.596258 2.005784 5.632605 2.305484 3.335904 3.153171
1962 4.838046 3.569980 1.705584 2.185661 5.840831 2.521422 3.623393 3.404384
1963 5.081334 3.765210 1.817695 2.366395 6.024493 2.739074 3.894816 3.669238
1964 5.158098 3.823693 1.882819 2.458797 6.099329 2.851257 3.985147 3.803985
1965 5.223651 3.874179 1.948872 2.549700 6.152028 2.965938 4.072979 3.921808
1966 5.332477 3.978149 2.032633 2.669666 6.110469 3.099186 4.210011 4.032705
1967 5.429449 4.073408 2.117609 2.787846 6.057341 3.227292 4.352399 4.160311
1968 5.674379 4.279777 2.245501 2.978363 6.253142 3.461154 4.556984 4.373036
1969 5.915524 4.486290 2.381962 3.177378 6.435590 3.706155 4.765710 4.603542
There may be instances where it’s beneficial to adjust and alter \(X_{0}\) and \(X_{1}\) directly, without referring back to the original dataset. For instance, consider the five distinct education variables (school.illit, school.prim, school.med, school.high, school.post.high), which denote the count of individuals with different education levels, in thousands. In their 2003 study, Abadie and Gardeazabal combined the two highest categories (school.high and school.post.high) to represent all individuals with education beyond high school. They also used the percentage share for each predictor instead of the total count of individuals. The subsequent code demonstrates how to merge these variables in both \(X_{0}\) and \(X_{1}\), and how to convert the relevant values into percentage distributions:
# Combine 'school.high' and 'school.post.high' in X1 and X0 matrices
$X1["school.high",] <- dataprep.out$X1["school.high",] + dataprep.out$X1["school.post.high",]
dataprep.out$X0["school.high",] <- dataprep.out$X0["school.high",] + dataprep.out$X0["school.post.high",]
dataprep.out
# Remove 'school.post.high' from X1 and X0 matrices
<- which(rownames(dataprep.out$X1)=="school.post.high")
post_high_row $X1 <- as.matrix(dataprep.out$X1[-post_high_row,])
dataprep.out$X0 <- as.matrix(dataprep.out$X0[-post_high_row,])
dataprep.out
# Find the indices of 'school.illit' and 'school.high' in X0 matrix
<- which(rownames(dataprep.out$X0)=="school.illit")
illit_row_index <- which(rownames(dataprep.out$X0)=="school.high")
high_row_index
# Scale the values in the rows between 'school.illit' and 'school.high' in X1 and X0 matrices to percentages
$X1[illit_row_index:high_row_index,] <- (100*dataprep.out$X1[illit_row_index:high_row_index,]) / sum(dataprep.out$X1[illit_row_index:high_row_index,])
dataprep.out$X0[illit_row_index:high_row_index,] <- 100*scale(dataprep.out$X0[illit_row_index:high_row_index,], center=FALSE, scale=colSums(dataprep.out$X0[illit_row_index:high_row_index,])) dataprep.out
Running synth()
The function synth()
identifies the synthetic control for the Basque region by determining the weight vector \(W^{*}\), which is achieved by solving the nested optimization problem outlined in equations (1) and (2).
\[ \left\|X_{1}-X_{0} W\right\|_{V}=\sqrt{\left(X_{1}-X_{0} W\right)^{\prime} V\left(X_{1}-X_{0} W\right)} \hspace{1cm} (1) \]
\[ \arg \min _{V \in \mathcal{V}}\left(Z_{1}-Z_{0} W^{*}(V)\right)^{\prime}\left(Z_{1}-Z_{0} W^{*}(V)\right) \hspace{1cm} (2) \]
For any given \(V\), the function synth()
locates a \(W^{*}(V)\) by minimizing equation (1). This minimization is achieved using a constrained quadratic optimization function from the kernlab
package in R. The function synth()
then determines the diagonal matrix \(V^{*}\) that minimizes equation (2), thereby minimizing the Mean Squared Prediction Error (MSPE) for the pre-intervention period. To solve the optimization problem presented in equation (2), the authors utilize the general-purpose optimization function optim()
in R.
<- synth(data.prep.obj = dataprep.out,
synth.out method = "BFGS" # set optim() to use the BFGS quasi-Newton algorithm.
)
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.008864629
solution.v:
0.01556809 0.001791073 0.04417159 0.03409436 8.45063e-05 0.2009836 0.09484592 0.007689227 0.1339499 0.008723866 0.009680737 0.1081257 0.3402913
solution.w:
4.92e-08 5.17e-08 1.352e-07 2.85e-08 5.32e-08 5.177e-07 5.24e-08 7.29e-08 0.8507986 2.274e-07 4.03e-08 9.51e-08 0.1491998 5.61e-08 9.02e-08 1.061e-07
Obtaining Results: Tables, Figures, and Estimates
The annual discrepancies in the GPD trend between the Basque region and its synthetic control
<- dataprep.out$Y1plot - (dataprep.out$Y0plot %*% synth.out$solution.w)
gaps 1:3, 1] gaps[
1955 1956 1957
0.15021896 0.09166384 0.03714761
gaps
17
1955 0.150218957
1956 0.091663836
1957 0.037147612
1958 -0.006003578
1959 -0.045912501
1960 -0.093028265
1961 -0.158741268
1962 -0.088701802
1963 -0.025035609
1964 0.040376216
1965 0.102991299
1966 0.097365414
1967 0.091767963
1968 0.091457295
1969 0.088290338
1970 0.032164161
1971 -0.010718484
1972 -0.065240285
1973 -0.122248917
1974 0.018114662
1975 0.149837948
1976 0.012249750
1977 -0.121311425
1978 -0.287951073
1979 -0.417501286
1980 -0.566472165
1981 -0.733650764
1982 -0.780284226
1983 -0.826525640
1984 -0.754859524
1985 -0.672982676
1986 -0.785770335
1987 -0.880817969
1988 -0.966277731
1989 -1.035845236
1990 -1.011476539
1991 -1.012419501
1992 -0.964326371
1993 -0.920837104
1994 -0.969747353
1995 -0.863014218
1996 -0.851919587
1997 -0.828080864
Tables are produced by using the synth.tab()
function:
<- synth.tab(dataprep.res = dataprep.out,
synth.tables synth.res = synth.out
)
This function produces four different types of tables:
names(synth.tables)
[1] "tab.pred" "tab.v" "tab.w" "tab.loss"
- tab.pred compares pre-treatment predictor values for the treated unit, the synthetic control unit, and all the units in the sample.
$tab.pred synth.tables
Treated Synthetic Sample Mean
school.illit 3.321 7.645 10.983
school.prim 85.893 82.285 80.911
school.med 7.522 6.965 5.427
school.high 3.264 3.105 2.679
invest 24.647 21.583 21.424
special.gdpcap.1960.1969 5.285 5.271 3.581
special.sec.agriculture.1961.1969 6.844 6.179 21.353
special.sec.energy.1961.1969 4.106 2.760 5.310
special.sec.industry.1961.1969 45.082 37.636 22.425
special.sec.construction.1961.1969 6.150 6.952 7.276
special.sec.services.venta.1961.1969 33.754 41.104 36.528
special.sec.services.nonventa.1961.1969 4.072 5.371 7.111
special.popdens.1969 246.890 196.288 99.414
The synthetic Basque country is fairly similar to the real Basque country. The sample means of the predictor variables over the 16 control regions are provided as a comparison.
- tab.v shows the weight corresponding to each predictor
$tab.v synth.tables
v.weights
school.illit 0.016
school.prim 0.002
school.med 0.044
school.high 0.034
invest 0
special.gdpcap.1960.1969 0.201
special.sec.agriculture.1961.1969 0.095
special.sec.energy.1961.1969 0.008
special.sec.industry.1961.1969 0.134
special.sec.construction.1961.1969 0.009
special.sec.services.venta.1961.1969 0.01
special.sec.services.nonventa.1961.1969 0.108
special.popdens.1969 0.34
- tab.v shows the weight corresponding to each potential control unit
$tab.w synth.tables
w.weights unit.names unit.numbers
2 0.000 Andalucia 2
3 0.000 Aragon 3
4 0.000 Principado De Asturias 4
5 0.000 Baleares (Islas) 5
6 0.000 Canarias 6
7 0.000 Cantabria 7
8 0.000 Castilla Y Leon 8
9 0.000 Castilla-La Mancha 9
10 0.851 Cataluna 10
11 0.000 Comunidad Valenciana 11
12 0.000 Extremadura 12
13 0.000 Galicia 13
14 0.149 Madrid (Comunidad De) 14
15 0.000 Murcia (Region de) 15
16 0.000 Navarra (Comunidad Foral De) 16
18 0.000 Rioja (La) 18
We see that Cataluna contributes 85 percent and Madrid contributes 15 percent to the synthetic Basque country; a zero weight is assigned to all the other control regions.
path.plot(synth.res = synth.out,
dataprep.res = dataprep.out,
Ylab = "Real GDPpc",
Xlab = "year",
Ylim = c(0,12),
Legend = c("Basque country","synthetic Basque country"),
Legend.position = "bottomright"
)
We know that for the Basque country, terrorism claimed its first victim in 1968, but large scale terrorist activity only began in the mid-70s.
- The gaps.plot() function shows how the difference between treated and synthetic control outcomes change over time.
gaps.plot(synth.res = synth.out,
dataprep.res = dataprep.out,
Ylab = "Gap in real GDPpc",
Xlab = "year",
Ylim = c(-1.5,1.5),
Main = NA
)
Placebo tests
<-
dataprep.out dataprep(foo = basque,
predictors = c("school.illit" , "school.prim" , "school.med" ,
"school.high" , "school.post.high" , "invest") ,
predictors.op = "mean" ,
time.predictors.prior = 1964:1969 ,
special.predictors = list(
list("gdpcap" , 1960:1969 , "mean"),
list("sec.agriculture" , seq(1961,1969,2), "mean"),
list("sec.energy" , seq(1961,1969,2), "mean"),
list("sec.industry" , seq(1961,1969,2), "mean"),
list("sec.construction" , seq(1961,1969,2), "mean"),
list("sec.services.venta" , seq(1961,1969,2), "mean"),
list("sec.services.nonventa" ,seq(1961,1969,2), "mean"),
list("popdens", 1969, "mean")
),dependent = "gdpcap",
unit.variable = "regionno",
unit.names.variable = "regionname",
time.variable = "year",
treatment.identifier = 10, # Change the ID to other unit that did NOT receive the treatment
controls.identifier = c(2:9,11:16,18),
time.optimize.ssr = 1960:1969,
time.plot = 1955:1997
)
# Combine 'school.high' and 'school.post.high' in X1
$X1["school.high",] <- dataprep.out$X1["school.high",] + dataprep.out$X1["school.post.high",]
dataprep.out
# Remove 'school.post.high' from X1
$X1 <- as.matrix(dataprep.out$X1[-which(rownames(dataprep.out$X1)=="school.post.high"),])
dataprep.out
# Combine 'school.high' and 'school.post.high' in X0
$X0["school.high",] <- dataprep.out$X0["school.high",] + dataprep.out$X0["school.post.high",]
dataprep.out
# Remove 'school.post.high' from X0
$X0 <- dataprep.out$X0[-which(rownames(dataprep.out$X0)=="school.post.high"),]
dataprep.out
# Find the row indices for 'school.illit' and 'school.high'
<- which(rownames(dataprep.out$X0)=="school.illit")
lowest <- which(rownames(dataprep.out$X0)=="school.high")
highest
# Convert the values in X1 from 'school.illit' to 'school.high' into percentage shares
$X1[lowest:highest,] <- (100*dataprep.out$X1[lowest:highest,]) / sum(dataprep.out$X1[lowest:highest,])
dataprep.out
# Scale the values in X0 from 'school.illit' to 'school.high' and convert into percentages
$X0[lowest:highest,] <- 100*scale(dataprep.out$X0[lowest:highest,], center=FALSE, scale=colSums(dataprep.out$X0[lowest:highest,])) dataprep.out
<- synth(data.prep.obj = dataprep.out,
synth.out method = "BFGS"
)
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0003093975
solution.v:
0.01405753 0.01342444 0.00503692 0.001792628 0.07939864 0.5166646 0.2816736 0.08745663 1.02215e-05 1.72355e-05 2.00017e-05 0.000439494 7.9985e-06
solution.w:
1.09249e-05 7.4848e-06 0.03590867 0.271566 1.48966e-05 0.2574606 3.0579e-06 2.6666e-06 2.53122e-05 2.2184e-06 4.9234e-06 0.4349691 1.81685e-05 3.5539e-06 2.4453e-06
path.plot(synth.res = synth.out,
dataprep.res = dataprep.out,
tr.intake = NA,
Ylab = "Real GDPpc",
Xlab = "year",
Ylim = c(0,12),
Legend = c("Catalonia country","synthetic Catalonia"),
Legend.position = "bottomright",
)
gaps.plot(synth.res = synth.out,
dataprep.res = dataprep.out,
Ylab = "Gap in real GDPpc",
Xlab = "year",
Ylim = c(-1.5,1.5),
Main = NA
)
- Synthetic control methods are beneficial because they allow for placebo tests. These tests reassign the intervention to units and periods where it didn’t occur to test the method’s effectiveness.
- Abadie and Gardeazabal (2003) demonstrated the placebo test for the synthetic control method using Catalonia as an example. They showed that there was no identifiable treatment effect when the synthetic control method was applied to Catalonia.
- There are several types of placebo tests that can be run with this package, such as placebos-in-time and placebos with outcome variables that should be unaffected by the treatment.
- Users can perform exact inferential techniques similar to permutation tests by applying the synthetic control method to every control unit in the sample and collecting information on the gaps.
- The user can plot these gaps to visually determine whether the line associated with the true synthetic control unit conspicuously differentiates itself from the rest.
- The approach is implemented by running a for loop to implement placebo tests across all control units in the sample and collecting information on the gaps.
- The results of this inferential technique can be seen in the figure below, where regions with a poor fit for the pre-treatment period are excluded.
- The figure demonstrates that when exposure to terrorism is reassigned to other regions, there is a very low probability of obtaining a gap as large as the one obtained for the Basque region.
- Poor-fitting regions are usually those that are very unusual in their pre-treatment characteristics, so no combination of other regions in the sample can reproduce the pre-treatment trends for these regions.
- An alternative to excluding regions based on MPSE is to compute the distribution of the ratio of post- to pre-treatment MPSE.
<- matrix(NA,length(1955:1997),17)
store colnames(store) <- unique(basque$regionname)[-1]
# run placebo test
for(iter in 2:18)
{<-
dataprep.out dataprep(foo = basque,
predictors = c("school.illit" , "school.prim" , "school.med" ,
"school.high" , "school.post.high" , "invest") ,
predictors.op = "mean" ,
time.predictors.prior = 1964:1969 ,
special.predictors = list(
list("gdpcap" , 1960:1969 , "mean"),
list("sec.agriculture" , seq(1961,1969,2), "mean"),
list("sec.energy" , seq(1961,1969,2), "mean"),
list("sec.industry" , seq(1961,1969,2), "mean"),
list("sec.construction" , seq(1961,1969,2), "mean"),
list("sec.services.venta" , seq(1961,1969,2), "mean"),
list("sec.services.nonventa" ,seq(1961,1969,2), "mean"),
list("popdens", 1969, "mean")
),dependent = "gdpcap",
unit.variable = "regionno",
unit.names.variable = "regionname",
time.variable = "year",
treatment.identifier = iter,
controls.identifier = c(2:18)[-iter+1],
time.optimize.ssr = 1960:1969,
time.plot = 1955:1997
)
$X1["school.high",] <-
dataprep.out$X1["school.high",] + dataprep.out$X1["school.post.high",]
dataprep.out$X1 <-
dataprep.outas.matrix(dataprep.out$X1[-which(rownames(dataprep.out$X1)=="school.post.high"),])
$X0["school.high",] <-
dataprep.out$X0["school.high",] + dataprep.out$X0["school.post.high",]
dataprep.out$X0 <-
dataprep.out$X0[-which(rownames(dataprep.out$X0)=="school.post.high"),]
dataprep.out
<- which(rownames(dataprep.out$X0)=="school.illit")
lowest <- which(rownames(dataprep.out$X0)=="school.high")
highest
$X1[lowest:highest,] <-
dataprep.out100*dataprep.out$X1[lowest:highest,]) /
(sum(dataprep.out$X1[lowest:highest,])
$X0[lowest:highest,] <-
dataprep.out100*scale(dataprep.out$X0[lowest:highest,],
center=FALSE,
scale=colSums(dataprep.out$X0[lowest:highest,])
)
# run synth
<- synth(
synth.out data.prep.obj = dataprep.out,
method = "BFGS"
)
# store gaps
-1] <- dataprep.out$Y1plot - (dataprep.out$Y0plot %*% synth.out$solution.w)
store[,iter }
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 9.769789e-06
solution.v:
0.1623716 0.08905496 0.09211519 0.1037775 0.04294261 0.09945295 0.02249312 0.1194742 0.03049302 0.1169147 0.07851873 0.001934032 0.04045739
solution.w:
4.669e-07 3.74e-08 0.06871975 0.09930165 1.441e-07 2.547e-07 0.08792619 2.2712e-06 2.52271e-05 0.6027934 1.2071e-06 0.04270843 0.09852008 3.265e-07 7.13e-08 4.937e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0007759196
solution.v:
0.07626995 0.06396751 0.04486642 0.005179403 0.1468072 0.1395639 0.04849465 0.02590254 0.3192316 0.1271959 0.001595317 8.09127e-05 0.0008446376
solution.w:
0.002847392 0.1942325 0.1397633 0.000914255 0.02097697 0.1304587 0.004124095 0.08064028 0.00442945 0.1654631 1.32207e-05 0.00022504 0.002175287 0.2469207 0.003732397 0.00308341
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0005444699
solution.v:
0.01231921 9.531e-07 0.1099582 0.08532242 0.04666882 0.0801413 0.1406802 0.06732613 0.1100789 0.1234673 0.007544381 0.09789709 0.1185951
solution.w:
1.53e-08 0.4234967 4.8e-09 0.3944335 9e-09 9.14e-08 2.78e-08 5.01e-08 2.25e-08 1.03e-08 3.59e-08 2.8e-09 3.1662e-06 0 0.1820663 1e-08
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.1186101
solution.v:
3.33288e-05 0.02256084 7.02905e-05 0.00728127 0.01570196 0.5482453 0.000157174 0.1427547 1.75687e-05 0.008580386 0.0194893 0.2209856 0.01412218
solution.w:
0.0004961839 2.6461e-06 2.1e-09 0.08641808 0.0002997258 9.362e-07 2.4951e-06 0.6565079 0.0009040657 5.63022e-05 3.1999e-06 0.2552986 4.115e-07 6.7063e-06 1e-10 2.769e-06
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.001323248
solution.v:
0.07393106 0.1211972 0.0001074404 0.0002131637 0.05120843 0.4328742 0.03867743 0.09524453 0.05366269 0.02123013 3.23276e-05 0.1113294 0.0002920094
solution.w:
1.22945e-05 9.7347e-06 2.8878e-05 0.1714208 6.9274e-06 7.4741e-06 1.2302e-06 1.1608e-06 3.3878e-06 0.2501338 7.9401e-06 2.5312e-06 0.5783608 1.8927e-06 2.639e-07 8.676e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0001164363
solution.v:
0.0003310047 0.00213788 0.05690954 0.001834906 0.1001146 0.3631324 0.0704808 0.130847 0.0003822926 0.006252639 0.05395768 0.0008783568 0.212741
solution.w:
3.71e-08 0.08179105 4.7e-09 0.0007733517 0.2397046 1.116e-07 5.7e-08 2.653e-07 0.5196322 6.15e-08 4.81e-08 5.2e-09 6.3e-08 1.5131e-06 0.1580965 1.92e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0007839866
solution.v:
0.02451637 0.1180743 3.00186e-05 0.002431598 0.05720764 0.2808197 0.08189894 1.56597e-05 0.2487468 0.04979899 0.01211177 0.07946989 0.04487829
solution.w:
8.4712e-06 0.004654015 0.05866539 2.86957e-05 2.36318e-05 2.2085e-06 1.6e-09 2.3678e-06 3.039e-06 0.001702417 0.7675253 6.07913e-05 6.5852e-06 0.0004907144 1.5026e-06 0.1668249
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.003441637
solution.v:
0.2819977 3.77076e-05 0.007304756 0.1913337 0.001234845 0.1077445 0.0001339538 0.2506414 0.01139748 0.01475026 0.0008777025 0.0002952483 0.1322507
solution.w:
7.3e-09 7.62e-08 4e-09 4.4e-08 3e-10 2.636e-07 1.562e-07 1.301e-07 1.3078e-06 0.6608552 0.0127187 5e-10 0.3263682 8.03e-08 3.22e-08 5.57476e-05
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.001731602
solution.v:
0.04854086 0.01447919 0.0005696063 4.6529e-06 0.4545145 0.1743639 0.1425628 0.006164227 0.109689 0.01319099 0.03525773 0.0003438238 0.000318642
solution.w:
0.09820231 2.5384e-06 1.3516e-06 0.0001096605 1.17957e-05 1.188e-06 9.979e-07 2.457e-07 6.69951e-05 1.6059e-06 1.6941e-06 0.1862123 1.95324e-05 9.556e-07 0.7153662 5.738e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.005472465
solution.v:
0.1142611 0.1418908 0.00463412 0.02252889 0.1341089 0.1908037 0.02291209 0.009698626 0.0640444 0.04711843 0.02584312 0.1773129 0.04484299
solution.w:
6.6007e-06 5.4245e-06 2.299e-06 0.004242197 0.1641151 0.1883559 3.3363e-06 0.2476481 0.3954618 4.6418e-06 1.09237e-05 3.2388e-06 2.77576e-05 5.0792e-06 9.71674e-05 1.0437e-05
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.1146395
solution.v:
0.02427409 0.01026832 0.1489549 0.1368267 0.001135408 0.1606203 0.4393701 0.03087507 0.01231399 8.552e-07 0.0001460941 0.003204566 0.03200954
solution.w:
9.72e-08 4.31e-08 1.28e-08 1.83e-08 9.18e-08 4.29e-08 2.558e-07 0.9999989 1.22e-08 3.36e-08 1.428e-07 1.5e-09 3.53e-08 9.67e-08 1.17e-08 1.699e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0004619251
solution.v:
0.01229623 0.05637697 0.1650633 0.2219118 0.0001000538 0.4147641 0.0001505474 0.06202615 5.54293e-05 0.01398409 3.1358e-06 0.05287225 0.0003960339
solution.w:
1.44024e-05 0.0001065384 0.009454045 0.0003189615 6.8995e-06 0.07434963 0.3488591 0.3598222 0.000879104 0.0002847956 0.2045495 2.347e-07 6.8642e-06 3.45546e-05 0.0001249696 0.00118827
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.5308454
solution.v:
0.05175466 0.01526417 0.1909266 0.04978537 0.1337012 0.04370974 0.2253978 0.04765019 0.0003084564 0.04816708 0.04571619 0.00414232 0.1434762
solution.w:
1.19e-08 3.6e-09 5.1e-09 1.65e-08 1.36e-08 1e-10 3.7e-09 6e-10 5.54e-08 1.9e-09 1e-10 2e-09 6e-09 2.04e-08 0.9999999 7.7e-09
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.001910045
solution.v:
0.08721003 0.1577794 0.07246097 0.08734601 0.001628194 0.07553655 0.09820718 0.1465568 0.001400442 0.03045952 0.05526499 0.135878 0.05027189
solution.w:
8.69e-08 1.163e-07 0.1163706 2.44e-08 0.600644 1.58e-08 2.181e-07 0.2829842 2.87e-08 5.59e-08 7.2e-09 7.85e-08 3.964e-07 2.84e-08 1.56e-08 7.08e-08
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0002241912
solution.v:
0.07026391 0.08293033 0.04596517 0.0001588333 0.05633258 0.05478577 2.3875e-05 0.08468431 0.2554776 0.112146 0.1737682 0.01777752 0.04568584
solution.w:
1.56678e-05 0.4373507 7.32e-08 8.2582e-06 4.0295e-06 9.5503e-06 0.000224769 3.3315e-06 1.43698e-05 5.4561e-06 9.89416e-05 1.07651e-05 0.00200689 2.4195e-06 0.1560405 0.4042042
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.008864629
solution.v:
0.01556809 0.001791073 0.04417159 0.03409436 8.45063e-05 0.2009836 0.09484592 0.007689227 0.1339499 0.008723866 0.009680737 0.1081257 0.3402913
solution.w:
4.92e-08 5.17e-08 1.352e-07 2.85e-08 5.32e-08 5.177e-07 5.24e-08 7.29e-08 0.8507986 2.274e-07 4.03e-08 9.51e-08 0.1491998 5.61e-08 9.02e-08 1.061e-07
X1, X0, Z1, Z0 all come directly from dataprep object.
****************
searching for synthetic control unit
****************
****************
****************
MSPE (LOSS V): 0.0006965933
solution.v:
0.1115345 0.1004851 0.01632611 0.02227407 0.01271669 0.1144789 0.001231086 0.1444595 0.1168052 1.07937e-05 0.1738477 0.04247705 0.1433533
solution.w:
8.388e-06 1.02366e-05 0.0001290961 2.4275e-06 3.6204e-06 1.6069e-06 0.04234802 0.07727088 1.01873e-05 1.10019e-05 8.1453e-06 5.66883e-05 2.7544e-06 1.03189e-05 0.8800848 4.18468e-05
# now do figure
<- store
data rownames(data) <- 1955:1997
# Set bounds in gaps data
<- 1
gap.start <- nrow(data)
gap.end <- 1955:1997
years <- which(rownames(data)=="1969")
gap.end.pre
# MSPE Pre-Treatment
<- apply(data[ gap.start:gap.end.pre,]^2,2,mean)
mse <- as.numeric(mse[16])
basque.mse # Exclude states with 5 times higher MSPE than basque
<- data[,mse<5*basque.mse]
data <- .75
Cex.set
# Plot
plot(years,data[gap.start:gap.end,which(colnames(data)=="Basque Country (Pais Vasco)")],
ylim=c(-2,2),xlab="year",
xlim=c(1955,1997),ylab="Gap in real GDPpc",
type="l",lwd=2,col="black",
xaxs="i",yaxs="i")
# Add lines for control states
for (i in 1:ncol(data)) { lines(years,data[gap.start:gap.end,i],col="gray") }
## Add Basque Line
lines(years,data[gap.start:gap.end,which(colnames(data)=="Basque Country (Pais Vasco)")],lwd=2,col="black")
# Add grid
abline(v=1970,lty="dotted",lwd=2)
abline(h=0,lty="dashed",lwd=2)
legend("bottomright",legend=c("Basque country","control regions"),
lty=c(1,1),col=c("black","gray"),lwd=c(2,1),cex=.8)
arrows(1967,-1.5,1968.5,-1.5,col="black",length=.1)
text(1961.5,-1.5,"Terrorism Onset",cex=Cex.set)
abline(v=1955)
abline(v=1997)
abline(h=-2)
abline(h=2)
Concluding remarks
- Synthetic control methods can be used to estimate causal effects.
- These methods also enable precise inferential techniques when used with the Synth package in R.
- Several enhancements to Synth are currently under development.
- A regression-based method is being tested to populate the entire \(V\) matrix, not just its diagonal.
- A version of Synth is being refined to select the \(W\) weights that best fit multiple outcome variables at the same time.
- Work is in progress on a version of Synth that can handle multiple treatments introduced over a period of time.
References
Abadie, A., Diamond, A., & Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software, 42(13), 1–17. https://doi.org/10.18637/jss.v042.i13
Abadie A, Diamond A, Hainmueller J (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association, 105(490), 493-505.
Abadie A, Gardeazabal J (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review, 93(1), 112-132.
Bertrand M, Duflo E, Mullainathan S (2004). “How Much Should We Trust Differences-inDifferences Estimates?” Quarterly Journal of Economics, 119(1), 249-275.
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1-20. URL http://www . jstatsoft. org/v11/i09/.
Lehmann EL (1997). Testing Statistical Hypotheses. 2nd edition. University of California Press, Berkeley.
Mebane, Jr WR, Sekhon JS (2011). “Genetic Optimization Using Derivatives: The rgenoud Package for R.” Journal of Statistical Software, 42(11), 1-26. URL http : //www . jstatsoft . org/v42/i11/.