StaggeredDifferenceInDifferences#
- class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#
A class to analyse data from staggered adoption Difference-in-Differences settings.
This class implements the Borusyak, Jaravel, and Spiess (BJS, 2024) imputation estimator for staggered adoption settings. It fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.
- Parameters:
data (
DataFrame) – A pandas dataframe with panel data (unit x time observations).formula (
str) – A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.unit_variable_name (
str) – Name of the column identifying units.time_variable_name (
str) – Name of the column identifying time periods.treated_variable_name (
str) – Name of the column indicating treatment status (0/1). Defaults to “treated”.treatment_time_variable_name (
str|None) – Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.never_treated_value (
Any) – Value indicating never-treated units in treatment_time column. Defaults to np.inf.model (
PyMCModel|RegressorMixin|None) – A model for the untreated outcome. Defaults to LinearRegression.event_window (
tuple[int,int] |None) – Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.reference_event_time (
int) – Event-time index associated with plots (reserved for future use). Defaults to -1.**kwargs (
Any) – Additional keyword arguments forwarded toBaseExperiment.
- data_#
Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.
- Type:
pd.DataFrame
- att_group_time_#
Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t. Includes an
identifiedcolumn; non-identified cells haveNaNestimates.- Type:
pd.DataFrame
- att_event_time_#
Event-time ATT estimates: ATT(e) for each event-time e = t - G. Includes an
identifiedcolumn; non-identified cells haveNaNestimates.- Type:
pd.DataFrame
- non_identified_cohorts_#
Treatment cohorts with at least one non-identified post-treatment ATT(g, t).
- Type:
Notes
This estimator requires the following identifying assumptions:
Absorbing treatment: Once a unit receives treatment, it must remain treated in all subsequent periods. Treatment cannot be reversed or temporarily suspended. This is validated at runtime.
Parallel trends: In the absence of treatment, treated and control units would have followed parallel outcome trajectories.
No anticipation: Units do not change their behavior in anticipation of future treatment.
Untreated support at each calendar period: The time fixed effect \(\gamma_t\) for calendar period \(t\) is identified only if at least one unit is untreated in that period. Without never-treated units, post-treatment effects for the last-treated cohort (and any calendar periods where every unit is already treated) are not identified. CausalPy warns when this condition fails and marks the affected
ATT(g, t)andATT(e)cells as non-identified in the output tables.
Panel Balance: This implementation supports both balanced and unbalanced panel data. While balanced panels (where each unit is observed in every time period) are common in staggered DiD applications, the imputation-based approach of Borusyak et al. (2024) can accommodate unbalanced panels. The key requirement is that treatment timing is well-defined for each unit, not that all units are observed in all periods. Unit and observation counts in the summary output are computed without assuming balanced panels.
References
Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.
Examples
>>> import causalpy as cp >>> from causalpy.data.simulate_data import generate_staggered_did_data >>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42) >>> result = cp.StaggeredDifferenceInDifferences( ... df, ... formula="y ~ 1 + C(unit) + C(time)", ... unit_variable_name="unit", ... time_variable_name="time", ... treated_variable_name="treated", ... treatment_time_variable_name="treatment_time", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "tune": 100, ... "draws": 200, ... "chains": 2, ... "progressbar": False, ... } ... ), ... )
Methods
Run the experiment algorithm: fit model, predict counterfactuals, and aggregate effects.
Generate a decision-ready summary of causal effects for Staggered Difference-in-Differences.
StaggeredDifferenceInDifferences.fit(*args, ...)Fit the underlying model.
Generate a self-contained HTML report for this experiment.
Recover the data of an experiment along with the prediction and causal impact information.
StaggeredDifferenceInDifferences.get_plot_data_bayesian([...])Get plotting data for Bayesian model.
Get plotting data for OLS model.
Validate the input data and parameters.
StaggeredDifferenceInDifferences.plot(*[, ...])Plot the staggered difference-in-differences event study.
Plot cohort-specific
ATT(g, t)trajectories.Ask the model to print its coefficients.
Set optional maketables rendering options for this experiment.
Print summary of main results.
Attributes
idataReturn the InferenceData object of the model.
supports_bayessupports_olslabelsdata- __init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#
- Parameters:
- Return type:
None
- classmethod __new__(*args, **kwargs)#