diff --git a/README.md b/README.md index 36a3365e5..d9e9347b2 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,7 @@ For information on use cases and background material on causal inference and het - [Interpretability](#interpretability) - [Causal Model Selection and Cross-Validation](#causal-model-selection-and-cross-validation) - [Inference](#inference) + - [Policy Learning](#policy-learning) - [For Developers](#for-developers) - [Running the tests](#running-the-tests) - [Generating the documentation](#generating-the-documentation) @@ -162,6 +163,25 @@ To install from source, see [For Developers](#for-developers) section below. +
+ Dynamic Double Machine Learning (click to expand) + + ```Python + from econml.dynamic.dml import DynamicDML + # Use defaults + est = DynamicDML() + # Or specify hyperparameters + est = DynamicDML(model_y=LassoCV(cv=3), + model_t=LassoCV(cv=3), + cv=3) + est.fit(Y, T, X=X, W=None, groups=groups, inference="auto") + # Effects + treatment_effects = est.effect(X_test) + # Confidence intervals + lb, ub = est.effect_interval(X_test, alpha=0.05) + ``` +
+
Causal Forests (click to expand) diff --git a/doc/reference.rst b/doc/reference.rst index c7f9f3ca7..76b528925 100644 --- a/doc/reference.rst +++ b/doc/reference.rst @@ -104,6 +104,21 @@ Sieve Methods econml.iv.sieve.HermiteFeatures econml.iv.sieve.DPolynomialFeatures +.. _dynamic_api: + +Estimators for Dynamic Treatment Regimes +---------------------------------------- + +.. _dynamicdml_api: + +Dynamic Double Machine Learning +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. autosummary:: + :toctree: _autosummary + + econml.dynamic.dml.DynamicDML + .. _policy_api: Policy Learning diff --git a/doc/spec/estimation/dml.rst b/doc/spec/estimation/dml.rst index 3032f72c2..0ffe0718a 100644 --- a/doc/spec/estimation/dml.rst +++ b/doc/spec/estimation/dml.rst @@ -50,6 +50,7 @@ characteristics :math:`X` of the treated samples, then one can use this method. .. testsetup:: + # DML import numpy as np X = np.random.choice(np.arange(5), size=(100,3)) Y = np.random.normal(size=(100,2)) @@ -71,8 +72,9 @@ Most of the methods provided make a parametric form assumption on the heterogene linear on some pre-defined; potentially high-dimensional; featurization). These methods include: :class:`.DML`, :class:`.LinearDML`, :class:`.SparseLinearDML`, :class:`.KernelDML`. -For fullly non-parametric heterogeneous treatment effect models, checkout the :class:`.NonParamDML` -and the :class:`.CausalForestDML`. For more options of non-parametric CATE estimators, +For fullly non-parametric heterogeneous treatment effect models, check out the :class:`.NonParamDML` +and the :class:`.CausalForestDML`. +For more options of non-parametric CATE estimators, check out the :ref:`Forest Estimators User Guide ` and the :ref:`Meta Learners User Guide `. diff --git a/doc/spec/estimation/dynamic_dml.rst b/doc/spec/estimation/dynamic_dml.rst new file mode 100644 index 000000000..b92f319d8 --- /dev/null +++ b/doc/spec/estimation/dynamic_dml.rst @@ -0,0 +1,95 @@ +.. _dynamicdmluserguide: + +=============================== +Dynamic Double Machine Learning +=============================== + +What is it? +================================== + +Dynamic Double Machine Learning is a method for estimating (heterogeneous) treatment effects when +treatments are offered over time via an adaptive dynamic policy. It applies to the case when +all potential dynamic confounders/controls (factors that simultaneously had a direct effect on the adaptive treatment +decision in the collected data and the observed outcome) are observed, but are either too many (high-dimensional) for +classical statistical approaches to be applicable or their effect on +the treatment and outcome cannot be satisfactorily modeled by parametric functions (non-parametric). +Both of these latter problems can be addressed via machine learning techniques (see e.g. [Lewis2021]_). + + +What are the relevant estimator classes? +======================================== + +This section describes the methodology implemented in the class +:class:`.DynamicDML`. +Click on each of these links for a detailed module documentation and input parameters of each class. + + +When should you use it? +================================== + +Suppose you have observational (or experimental from an A/B test) historical data, where multiple treatment(s)/intervention(s)/action(s) +:math:`T` were offered over time to each of the units and some final outcome(s) :math:`Y` was observed and all the variables :math:`W` that could have +potentially gone into the choice of :math:`T`, and simultaneously could have had a direct effect on the outcome :math:`Y` (aka controls or confounders) are also recorder in the dataset. + +If your goal is to understand what was the effect of the treatment on the outcome as a function of a set of observable +characteristics :math:`X` of the treated samples, then one can use this method. For instance call: + +.. testsetup:: + + # DynamicDML + import numpy as np + groups = np.repeat(a=np.arange(100), repeats=3, axis=0) + W_dyn = np.random.normal(size=(300, 1)) + X_dyn = np.random.normal(size=(300, 1)) + T_dyn = np.random.normal(size=(300, 2)) + y_dyn = np.random.normal(size=(300, )) + +.. testcode:: + + from econml.dynamic.dml import DynamicDML + est = DynamicDML() + est.fit(y_dyn, T_dyn, X=X_dyn, W=W_dyn, groups=groups) + + +Class Hierarchy Structure +================================== + +In this library we implement variants of several of the approaches mentioned in the last section. The hierarchy +structure of the implemented CATE estimators is as follows. + + .. inheritance-diagram:: econml.dynamic.dml.DynamicDML + :parts: 1 + :private-bases: + :top-classes: econml._OrthoLearner, econml._cate_estimator.LinearModelFinalCateEstimatorMixin + +Below we give a brief description of each of these classes: + + * **DynamicDML.** The class :class:`.DynamicDML` is an extension of the Double ML approach for treatments assigned sequentially over time periods. + This estimator will adjust for treatments that can have causal effects on future outcomes. The data corresponds to a Markov decision process :math:`\{X_t, W_t, T_t, Y_t\}_{t=1}^m`, + where :math:`X_t, W_t` corresponds to the state at time :math:`t`, :math:`T_t` is the treatment at time :math:`t` and :math:`Y_t` is the observed outcome at time :math:`t`. + + The model makes the following structural equation assumptions on the data generating process: + + .. math:: + + XW_t =~& A \cdot T_{t-1} + B \cdot XW_{t-1} + \eta_t\\ + T_t =~& p(T_{t-1}, XW_t, \zeta_t) \\ + Y_t =~& \theta_0(X_0)'T_t + \mu'XW_t + \epsilon_t + + where :math:`XW` is the concatenation of the :math:`X` and :math:`W` variables. + For more details about this model and underlying assumptions, see [Lewis2021]_. + + To learn the treatment effects of treatments in the different periods on the last period outcome, one can simply call: + + .. testcode:: + + from econml.dynamic.dml import DynamicDML + est = DynamicDML() + est.fit(y_dyn, T_dyn, X=X_dyn, W=W_dyn, groups=groups) + + + +Usage FAQs +========== + +See our FAQ section in :ref:`DML User Guide ` diff --git a/doc/spec/estimation_dynamic.rst b/doc/spec/estimation_dynamic.rst new file mode 100644 index 000000000..6e7b47cc0 --- /dev/null +++ b/doc/spec/estimation_dynamic.rst @@ -0,0 +1,11 @@ +Estimation Methods for Dynamic Treatment Regimes +================================================ + +This section contains methods for estimating (heterogeneous) treatment effects, +even when treatments are offered over time and the treatments were chosen based on a dynamic +adaptive policy. This is referred to as the dynamic treatment regime (see e.g. [Hernan2010]_) + +.. toctree:: + :maxdepth: 2 + + estimation/dynamic_dml diff --git a/doc/spec/references.rst b/doc/spec/references.rst index 0692af351..5f0213ac9 100644 --- a/doc/spec/references.rst +++ b/doc/spec/references.rst @@ -113,4 +113,14 @@ References .. [Lundberg2017] Lundberg, S., Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. - URL https://arxiv.org/abs/1705.07874 \ No newline at end of file + URL https://arxiv.org/abs/1705.07874 + +.. [Lewis2021] + Lewis, G., Syrgkanis, V. (2021). + Double/Debiased Machine Learning for Dynamic Treatment Effects. + URL https://arxiv.org/abs/2002.07285 + +.. [Hernan2010] + Hernán, Miguel A., and James M. Robins (2010). + Causal inference. + URL https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ diff --git a/doc/spec/spec.rst b/doc/spec/spec.rst index 693854193..649dba98c 100644 --- a/doc/spec/spec.rst +++ b/doc/spec/spec.rst @@ -19,6 +19,7 @@ The EconML Python SDK, developed by the ALICE team at MSR New England, incorpora comparison estimation estimation_iv + estimation_dynamic inference interpretability references diff --git a/econml/_cate_estimator.py b/econml/_cate_estimator.py index d1f43e2aa..d879608ae 100644 --- a/econml/_cate_estimator.py +++ b/econml/_cate_estimator.py @@ -563,7 +563,7 @@ def effect(self, X=None, *, T0, T1): """ Calculate the heterogeneous treatment effect :math:`\\tau(X, T0, T1)`. - The effect is calculatred between the two treatment points + The effect is calculated between the two treatment points conditional on a vector of features on a set of m test samples :math:`\\{T0_i, T1_i, X_i\\}`. Since this class assumes a linear effect, only the difference between T0ᵢ and T1ᵢ matters for this computation. diff --git a/econml/_ortho_learner.py b/econml/_ortho_learner.py index 9ccdd88b3..86e0404bf 100644 --- a/econml/_ortho_learner.py +++ b/econml/_ortho_learner.py @@ -685,7 +685,8 @@ def fit(self, Y, T, X=None, W=None, Z=None, *, sample_weight=None, freq_weight=N nuisances=nuisances, sample_weight=sample_weight, freq_weight=freq_weight, - sample_var=sample_var) + sample_var=sample_var, + groups=groups) return self @@ -770,18 +771,19 @@ def _fit_nuisances(self, Y, T, X=None, W=None, Z=None, sample_weight=None, group return nuisances, fitted_models, fitted_inds, scores def _fit_final(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, - freq_weight=None, sample_var=None): + freq_weight=None, sample_var=None, groups=None): self._ortho_learner_model_final.fit(Y, T, **filter_none_kwargs(X=X, W=W, Z=Z, nuisances=nuisances, sample_weight=sample_weight, freq_weight=freq_weight, - sample_var=sample_var)) + sample_var=sample_var, + groups=groups)) self.score_ = None if hasattr(self._ortho_learner_model_final, 'score'): self.score_ = self._ortho_learner_model_final.score(Y, T, **filter_none_kwargs(X=X, W=W, Z=Z, nuisances=nuisances, - sample_weight=sample_weight) - ) + sample_weight=sample_weight, + groups=groups)) def const_marginal_effect(self, X=None): X, = check_input_arrays(X) @@ -816,7 +818,7 @@ def effect_inference(self, X=None, *, T0=0, T1=1): return super().effect_inference(X, T0=T0, T1=T1) effect_inference.__doc__ = LinearCateEstimator.effect_inference.__doc__ - def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None): + def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None): """ Score the fitted CATE model on a new data set. Generates nuisance parameters for the new data set based on the fitted nuisance models created at fit time. @@ -840,6 +842,8 @@ def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None): Instruments for each sample sample_weight: optional(n,) vector or None (Default=None) Weights for each samples + groups: (n,) vector, optional + All rows corresponding to the same group will be kept together during splitting. Returns ------- @@ -862,7 +866,7 @@ def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None): for i, models_nuisances in enumerate(self._models_nuisance): # for each model under cross fit setting for j, mdl in enumerate(models_nuisances): - nuisance_temp = mdl.predict(Y, T, **filter_none_kwargs(X=X, W=W, Z=Z)) + nuisance_temp = mdl.predict(Y, T, **filter_none_kwargs(X=X, W=W, Z=Z, groups=groups)) if not isinstance(nuisance_temp, tuple): nuisance_temp = (nuisance_temp,) @@ -876,7 +880,8 @@ def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None): nuisances[it] = np.mean(nuisances[it], axis=0) return self._ortho_learner_model_final.score(Y, T, nuisances=nuisances, - **filter_none_kwargs(X=X, W=W, Z=Z, sample_weight=sample_weight)) + **filter_none_kwargs(X=X, W=W, Z=Z, + sample_weight=sample_weight, groups=groups)) @property def ortho_learner_model_final_(self): diff --git a/econml/data/dynamic_panel_dgp.py b/econml/data/dynamic_panel_dgp.py new file mode 100644 index 000000000..82b842912 --- /dev/null +++ b/econml/data/dynamic_panel_dgp.py @@ -0,0 +1,460 @@ +import numpy as np +from econml.utilities import cross_product +from statsmodels.tools.tools import add_constant +import pandas as pd +import scipy as sp +from scipy.stats import expon +from sklearn.linear_model import LinearRegression +import matplotlib.pyplot as plt +import joblib +import os + + +dir = os.path.dirname(__file__) + +# covariance matrix + + +def new_cov_matrix(cov): + p = cov.shape[0] + # get eigen value and eigen vectors + e_val, e_vec = sp.linalg.eigh(cov) + start = [0, 35, 77, 86] + end = [35, 77, 86, p] + e_val_new = np.array([]) + for i, j in zip(start, end): + e_val_new = np.append(e_val_new, linear_approximation(i, j, e_val)) + # simulate eigen vectors + e_vec_new = np.zeros_like(e_vec) + for i in range(p): + w = np.zeros(p) # , np.random.normal(0.01, 0.01, size=p) + w[np.random.choice(p, 6)] += np.random.normal(0.01, 0.06, size=(6)) + e_vec_new[:, i] = w / np.linalg.norm(w) + # keep the top 4 eigen value and corresponding eigen vector + e_vec_new[:, -4:] = e_vec[:, -4:] + e_val_new[-4:] = e_val[-4:] + # replace the negative eigen values + e_val_new[np.where(e_val_new < 0)] = e_val[np.where(e_val_new < 0)] + # generate a new covariance matrix + cov_new = e_vec_new.dot(np.diag(e_val_new)).dot(e_vec_new.T) + return cov_new + +# get linear approximation of eigen values + + +def linear_approximation(start, end, e_val): + est = LinearRegression() + X = np.arange(start, end).reshape(-1, 1) + est.fit(X, e_val[start:end]) + pred = est.predict(X) + return pred + + +# coefs +def generate_coefs(index, columns): + simulated_coefs_df = pd.DataFrame(0, index=index, columns=columns) + # get the indices of each group of features + ind_demo = [columns.index(col) for col in columns if "demo" in col] + ind_proxy = [columns.index(col) for col in columns if "proxy" in col] + ind_investment = [columns.index(col) + for col in columns if "investment" in col] + + for i in range(7): + outcome_name = simulated_coefs_df.index[i] + if "proxy" in outcome_name: + ind_same_proxy = [ + ind for ind in ind_proxy if outcome_name in columns[ind]] + # print(ind_same_proxy) + random_proxy_name = np.random.choice( + [proxy for proxy in index[:4] if proxy != outcome_name] + ) + ind_random_other_proxy = [ + ind for ind in ind_proxy if random_proxy_name in columns[ind] + ] + # demo + simulated_coefs_df.iloc[ + i, np.random.choice(ind_demo, 2) + ] = np.random.uniform(0.004, 0.05) + # same proxy + simulated_coefs_df.iloc[i, ind_same_proxy] = sorted( + np.random.choice(expon.pdf(np.arange(10)) * + 5e-1, 6, replace=False) + ) + simulated_coefs_df.iloc[i, ind_random_other_proxy] = sorted( + np.random.choice(expon.pdf(np.arange(10)) * + 5e-2, 6, replace=False) + ) + elif "investment" in outcome_name: + ind_same_invest = [ + ind for ind in ind_investment if outcome_name in columns[ind] + ] + random_proxy_name = np.random.choice(index[:4]) + ind_random_other_proxy = [ + ind for ind in ind_proxy if random_proxy_name in columns[ind] + ] + simulated_coefs_df.iloc[ + i, np.random.choice(ind_demo, 2) + ] = np.random.uniform(0.001, 0.05) + simulated_coefs_df.iloc[i, ind_same_invest] = sorted( + np.random.choice(expon.pdf(np.arange(10)) * + 5e-1, 6, replace=False) + ) + simulated_coefs_df.iloc[i, ind_random_other_proxy] = sorted( + np.random.choice(expon.pdf(np.arange(10)) * + 1e-1, 6, replace=False) + ) + return simulated_coefs_df + + +# residuals + + +def simulate_residuals(ind): + n, n_pos, n_neg = joblib.load(os.path.join(dir, f"input_dynamicdgp/n_{ind}.jbl")) + # gmm + est = joblib.load(os.path.join(dir, f"input_dynamicdgp/gm_{ind}.jbl")) + x_new = est.sample(n - n_pos - n_neg)[0].flatten() + + # log normal on outliers + if n_pos > 0: + # positive outliers + s, loc, scale = joblib.load(os.path.join(dir, f"input_dynamicdgp/lognorm_pos_{ind}.jbl")) + fitted_pos_outliers = sp.stats.lognorm( + s, loc=loc, scale=scale).rvs(size=n_pos) + else: + fitted_pos_outliers = np.array([]) + # negative outliers + if n_neg > 0: + s, loc, scale = joblib.load(os.path.join(dir, f"input_dynamicdgp/lognorm_neg_{ind}.jbl")) + fitted_neg_outliers = - \ + sp.stats.lognorm(s, loc=loc, scale=scale).rvs(size=n_neg) + else: + fitted_neg_outliers = np.array([]) + x_new = np.concatenate((x_new, fitted_pos_outliers, fitted_neg_outliers)) + return x_new + + +def simulate_residuals_all(res_df): + res_df_new = res_df.copy(deep=True) + for i in range(res_df.shape[1]): + res_df_new.iloc[:, i] = simulate_residuals(i) + # demean the new residual again + res_df_new = res_df_new - res_df_new.mean(axis=0) + return res_df_new + +# generate data + + +def get_prediction(df, coef_matrix, residuals, thetas, n, intervention, columns, index, counterfactual): + data_matrix = df[columns].values + # sample residuals + sample_residuals = residuals + preds = np.matmul(data_matrix, coef_matrix.T) + + # get prediction for current investment + if counterfactual: + pred_inv = np.zeros(preds[:, 4:].shape) + else: + pred_inv = preds[:, 4:] + sample_residuals[:, 4:] + intervention + df[index[4:]] = pd.DataFrame(pred_inv, index=df.index) + + # get prediction for current proxy + pred_proxy = preds[:, :4] + sample_residuals[:, :4] + \ + np.matmul(pred_inv, thetas.T) + df[index[:4]] = pd.DataFrame(pred_proxy, index=df.index) + return df + + +def generate_dgp( + cov_matrix, + n_tpid, + t_period, + coef_matrix, + residual_matrix, + thetas, + intervention, + columns, + index, + counterfactual +): + df_all = pd.DataFrame() + # get first period prediction + m = cov_matrix.shape[0] + x = np.random.multivariate_normal(np.repeat(0, m), cov_matrix, size=n_tpid) + df = pd.DataFrame( + np.hstack( + (np.arange(n_tpid).reshape(-1, 1), + np.repeat(1, n_tpid).reshape(-1, 1), x) + ), + columns=["id", "datetime"] + columns, + ) + df = get_prediction(df, coef_matrix, residual_matrix[0], + thetas, n_tpid, intervention, columns, index, False) + df_all = pd.concat([df_all, df], axis=0) + + # iterate the step ahead contruction + for t in range(2, t_period + 1): + # prepare new x + new_df = df.copy(deep=True) + new_df["datetime"] = np.repeat(t, n_tpid) + for name in index: + for i in range(-6, -1): + new_df[f"{name}_{i}"] = df[f"{name}_{i+1}"] + new_df[f"{name}_-1"] = df[name] + df = get_prediction(new_df, coef_matrix, residual_matrix[t - 1], + thetas, n_tpid, [0, 0, 0], columns, index, counterfactual) + df_all = pd.concat([df_all, df]) + df_all = df_all.sort_values(["id", "datetime"]) + return df_all + + +class AbstracDynamicPanelDGP: + + def __init__(self, n_periods, n_treatments, n_x): + self.n_periods = n_periods + self.n_treatments = n_treatments + self.n_x = n_x + return + + def create_instance(self, *args, **kwargs): + pass + + def _gen_data_with_policy(self, n_units, policy_gen, random_seed=123): + pass + + def static_policy_data(self, n_units, tau, random_seed=123): + def policy_gen(Tpre, X, period): + return tau[period] + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + def adaptive_policy_data(self, n_units, policy_gen, random_seed=123): + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + def static_policy_effect(self, tau, mc_samples=1000): + Y_tau, _, _, _ = self.static_policy_data(mc_samples, tau) + Y_zero, _, _, _ = self.static_policy_data( + mc_samples, np.zeros((self.n_periods, self.n_treatments))) + return np.mean(Y_tau[np.arange(Y_tau.shape[0]) % self.n_periods == self.n_periods - 1]) - \ + np.mean(Y_zero[np.arange(Y_zero.shape[0]) % + self.n_periods == self.n_periods - 1]) + + def adaptive_policy_effect(self, policy_gen, mc_samples=1000): + Y_tau, _, _, _ = self.adaptive_policy_data(mc_samples, policy_gen) + Y_zero, _, _, _ = self.static_policy_data( + mc_samples, np.zeros((self.n_periods, self.n_treatments))) + return np.mean(Y_tau[np.arange(Y_tau.shape[0]) % self.n_periods == self.n_periods - 1]) - \ + np.mean(Y_zero[np.arange(Y_zero.shape[0]) % + self.n_periods == self.n_periods - 1]) + + +class DynamicPanelDGP(AbstracDynamicPanelDGP): + + def __init__(self, n_periods, n_treatments, n_x): + super().__init__(n_periods, n_treatments, n_x) + + def create_instance(self, s_x, sigma_x, sigma_y, conf_str, epsilon, Alpha_unnormalized, + hetero_strength=0, hetero_inds=None, + autoreg=.5, state_effect=.5, random_seed=123): + random_state = np.random.RandomState(random_seed) + self.s_x = s_x + self.conf_str = conf_str + self.sigma_x = sigma_x + self.sigma_y = sigma_y + self.hetero_inds = hetero_inds.astype( + int) if hetero_inds is not None else hetero_inds + self.hetero_strength = hetero_strength + self.autoreg = autoreg + self.state_effect = state_effect + self.random_seed = random_seed + self.endo_inds = np.setdiff1d( + np.arange(self.n_x), hetero_inds).astype(int) + # The first s_x state variables are confounders. The final s_x variables are exogenous and can create + # heterogeneity + self.Alpha = Alpha_unnormalized + self.Alpha /= np.linalg.norm(self.Alpha, axis=1, ord=1, keepdims=True) + self.Alpha *= state_effect + if self.hetero_inds is not None: + self.Alpha[self.hetero_inds] = 0 + + self.Beta = np.zeros((self.n_x, self.n_x)) + for t in range(self.n_x): + self.Beta[t, :] = autoreg * np.roll(random_state.uniform(low=4.0**(-np.arange( + 0, self.n_x)), high=4.0**(-np.arange(1, self.n_x + 1))), t) + if self.hetero_inds is not None: + self.Beta[np.ix_(self.endo_inds, self.hetero_inds)] = 0 + self.Beta[np.ix_(self.hetero_inds, self.endo_inds)] = 0 + + self.epsilon = epsilon + self.zeta = np.zeros(self.n_x) + self.zeta[:self.s_x] = self.conf_str / self.s_x + + self.y_hetero_effect = np.zeros(self.n_x) + self.x_hetero_effect = np.zeros(self.n_x) + if self.hetero_inds is not None: + self.y_hetero_effect[self.hetero_inds] = random_state.uniform(.5 * hetero_strength, + 1.5 * hetero_strength) /\ + len(self.hetero_inds) + self.x_hetero_effect[self.hetero_inds] = random_state.uniform(.5 * hetero_strength, + 1.5 * hetero_strength) / \ + len(self.hetero_inds) + + self.true_effect = np.zeros((self.n_periods, self.n_treatments)) + self.true_effect[0] = self.epsilon + for t in np.arange(1, self.n_periods): + self.true_effect[t, :] = (self.zeta.reshape( + 1, -1) @ np.linalg.matrix_power(self.Beta, t - 1) @ self.Alpha) + + self.true_hetero_effect = np.zeros( + (self.n_periods, (self.n_x + 1) * self.n_treatments)) + self.true_hetero_effect[0, :] = cross_product(add_constant(self.y_hetero_effect.reshape(1, -1), + has_constant='add'), + self.epsilon.reshape(1, -1)) + for t in np.arange(1, self.n_periods): + self.true_hetero_effect[t, :] = cross_product(add_constant(self.x_hetero_effect.reshape(1, -1), + has_constant='add'), + self.zeta.reshape(1, -1) @ + np.linalg.matrix_power(self.Beta, t - 1) @ self.Alpha) + + return self + + def hetero_effect_fn(self, t, x): + if t == 0: + return (np.dot(self.y_hetero_effect, x.flatten()) + 1) * self.epsilon + else: + return (np.dot(self.x_hetero_effect, x.flatten()) + 1) *\ + (self.zeta.reshape(1, -1) @ np.linalg.matrix_power(self.Beta, t - 1) + @ self.Alpha).flatten() + + def _gen_data_with_policy(self, n_units, policy_gen, random_seed=123): + random_state = np.random.RandomState(random_seed) + Y = np.zeros(n_units * self.n_periods) + T = np.zeros((n_units * self.n_periods, self.n_treatments)) + X = np.zeros((n_units * self.n_periods, self.n_x)) + groups = np.zeros(n_units * self.n_periods) + for t in range(n_units * self.n_periods): + period = t % self.n_periods + if period == 0: + X[t] = random_state.normal(0, self.sigma_x, size=self.n_x) + T[t] = policy_gen(np.zeros(self.n_treatments), X[t], period, random_state) + else: + X[t] = (np.dot(self.x_hetero_effect, X[t - 1]) + 1) * np.dot(self.Alpha, T[t - 1]) + \ + np.dot(self.Beta, X[t - 1]) + \ + random_state.normal(0, self.sigma_x, size=self.n_x) + T[t] = policy_gen(T[t - 1], X[t], period, random_state) + Y[t] = (np.dot(self.y_hetero_effect, X[t]) + 1) * np.dot(self.epsilon, T[t]) + \ + np.dot(X[t], self.zeta) + \ + random_state.normal(0, self.sigma_y) + groups[t] = t // self.n_periods + + return Y, T, X, groups + + def observational_data(self, n_units, gamma, s_t, sigma_t, random_seed=123): + """ Generated observational data with some observational treatment policy parameters + + Parameters + ---------- + n_units : how many units to observe + gamma : what is the degree of auto-correlation of the treatments across periods + s_t : sparsity of treatment policy; how many states does it depend on + sigma_t : what is the std of the exploration/randomness in the treatment + """ + Delta = np.zeros((self.n_treatments, self.n_x)) + Delta[:, :s_t] = self.conf_str / s_t + + def policy_gen(Tpre, X, period, random_state): + return gamma * Tpre + (1 - gamma) * np.dot(Delta, X) + \ + random_state.normal(0, sigma_t, size=self.n_treatments) + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + +class SemiSynthetic: + + def create_instance(self): + # get new covariance matrix + self.cov_new = joblib.load(os.path.join(dir, f"input_dynamicdgp/cov_new.jbl")) + + # get coefs + self.index = ["proxy1", "proxy2", "proxy3", "proxy4", + "investment1", "investment2", "investment3", ] + self.columns = [f"{ind}_{i}" for ind in self.index for i in range(-6, 0)] +\ + [f"demo_{i}" for i in range(47)] + + self.coef_df = generate_coefs(self.index, self.columns) + self.n_proxies = 4 + self.n_treatments = 3 + + # get residuals + res_df = pd.DataFrame(columns=self.index) + self.new_res_df = simulate_residuals_all(res_df) + + def gen_data(self, n, n_periods, thetas, random_seed): + random_state = np.random.RandomState(random_seed) + n_proxies = self.n_proxies + n_treatments = self.n_treatments + coef_matrix = self.coef_df.values + residual_matrix = self.new_res_df.values + n_x = len(self.columns) + # proxy 1 is the outcome + outcome = "proxy1" + + # make fixed residuals + all_residuals = [] + for t in range(n_periods): + sample_residuals = [] + for i in range(7): + sample_residuals.append( + random_state.choice(residual_matrix[:, i], n)) + sample_residuals = np.array(sample_residuals).T + all_residuals.append(sample_residuals) + all_residuals = np.array(all_residuals) + + fn_df_control = generate_dgp(self.cov_new, n, n_periods, + coef_matrix, all_residuals, thetas, + [0, 0, 0], self.columns, self.index, False) + + fn_df_cf_control = generate_dgp(self.cov_new, n, n_periods, + coef_matrix, all_residuals, thetas, + [0, 0, 0], self.columns, self.index, True) + true_effect = np.zeros((n_periods, n_treatments)) + for i in range(n_treatments): + intervention = [0, 0, 0] + intervention[i] = 1 + fn_df_treated = generate_dgp(self.cov_new, n, n_periods, + coef_matrix, all_residuals, thetas, + intervention, self.columns, self.index, True) + for t in range(n_periods): + ate_control = fn_df_cf_control.loc[ + fn_df_control["datetime"] == t + 1, outcome + ].mean() + ate_treated = fn_df_treated.loc[ + fn_df_treated["datetime"] == t + 1, outcome + ].mean() + true_effect[t, i] = ate_treated - ate_control + + new_index = ["proxy1", "proxy2", "proxy3", "proxy4"] + new_columns = [f"{ind}_{i}" for ind in new_index for i in range(-6, 0)] +\ + [f"demo_{i}" for i in range(47)] + panelX = fn_df_control[new_columns].values.reshape(-1, n_periods, len(new_columns)) + panelT = fn_df_control[self.index[n_proxies:] + ].values.reshape(-1, n_periods, n_treatments) + panelY = fn_df_control[outcome].values.reshape(-1, n_periods) + panelGroups = fn_df_control["id"].values.reshape(-1, n_periods) + return panelX, panelT, panelY, panelGroups, true_effect + + def plot_coefs(self): + coef_df = self.coef_df + plt.figure(figsize=(20, 20)) + for i in range(7): + outcome = coef_df.index[i] + plt.subplot(2, 4, i + 1) + coef_list = coef_df.iloc[i] + coef_list = coef_list[coef_list != 0] + plt.plot(coef_list) + plt.xticks(rotation=90) + plt.title(f"outcome:{outcome}") + plt.show() + + def plot_cov(self): + plt.imshow(self.cov_new) + plt.colorbar() + plt.show() diff --git a/econml/data/input_dynamicdgp/cov_new.jbl b/econml/data/input_dynamicdgp/cov_new.jbl new file mode 100644 index 000000000..8d0bb38d9 Binary files /dev/null and b/econml/data/input_dynamicdgp/cov_new.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_0.jbl b/econml/data/input_dynamicdgp/gm_0.jbl new file mode 100644 index 000000000..e4e262d9d Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_0.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_1.jbl b/econml/data/input_dynamicdgp/gm_1.jbl new file mode 100644 index 000000000..02c721f90 Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_1.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_2.jbl b/econml/data/input_dynamicdgp/gm_2.jbl new file mode 100644 index 000000000..91737d308 Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_2.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_3.jbl b/econml/data/input_dynamicdgp/gm_3.jbl new file mode 100644 index 000000000..4c3ce9289 Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_3.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_4.jbl b/econml/data/input_dynamicdgp/gm_4.jbl new file mode 100644 index 000000000..77af014fa Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_4.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_5.jbl b/econml/data/input_dynamicdgp/gm_5.jbl new file mode 100644 index 000000000..68279106b Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_5.jbl differ diff --git a/econml/data/input_dynamicdgp/gm_6.jbl b/econml/data/input_dynamicdgp/gm_6.jbl new file mode 100644 index 000000000..fbdfd1f70 Binary files /dev/null and b/econml/data/input_dynamicdgp/gm_6.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_0.jbl b/econml/data/input_dynamicdgp/lognorm_neg_0.jbl new file mode 100644 index 000000000..21e84be89 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_0.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_1.jbl b/econml/data/input_dynamicdgp/lognorm_neg_1.jbl new file mode 100644 index 000000000..0ca173d59 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_1.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_2.jbl b/econml/data/input_dynamicdgp/lognorm_neg_2.jbl new file mode 100644 index 000000000..f4f1cc696 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_2.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_3.jbl b/econml/data/input_dynamicdgp/lognorm_neg_3.jbl new file mode 100644 index 000000000..a13e9d795 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_3.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_4.jbl b/econml/data/input_dynamicdgp/lognorm_neg_4.jbl new file mode 100644 index 000000000..e159c5665 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_4.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_neg_5.jbl b/econml/data/input_dynamicdgp/lognorm_neg_5.jbl new file mode 100644 index 000000000..9ad1efe43 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_neg_5.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_0.jbl b/econml/data/input_dynamicdgp/lognorm_pos_0.jbl new file mode 100644 index 000000000..0ee019744 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_0.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_1.jbl b/econml/data/input_dynamicdgp/lognorm_pos_1.jbl new file mode 100644 index 000000000..3e1fa8236 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_1.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_2.jbl b/econml/data/input_dynamicdgp/lognorm_pos_2.jbl new file mode 100644 index 000000000..e94f429f7 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_2.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_3.jbl b/econml/data/input_dynamicdgp/lognorm_pos_3.jbl new file mode 100644 index 000000000..4ff1b40dd Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_3.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_4.jbl b/econml/data/input_dynamicdgp/lognorm_pos_4.jbl new file mode 100644 index 000000000..472844028 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_4.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_5.jbl b/econml/data/input_dynamicdgp/lognorm_pos_5.jbl new file mode 100644 index 000000000..67d145e51 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_5.jbl differ diff --git a/econml/data/input_dynamicdgp/lognorm_pos_6.jbl b/econml/data/input_dynamicdgp/lognorm_pos_6.jbl new file mode 100644 index 000000000..ebed90e54 Binary files /dev/null and b/econml/data/input_dynamicdgp/lognorm_pos_6.jbl differ diff --git a/econml/data/input_dynamicdgp/n_0.jbl b/econml/data/input_dynamicdgp/n_0.jbl new file mode 100644 index 000000000..0d72cf99b Binary files /dev/null and b/econml/data/input_dynamicdgp/n_0.jbl differ diff --git a/econml/data/input_dynamicdgp/n_1.jbl b/econml/data/input_dynamicdgp/n_1.jbl new file mode 100644 index 000000000..a1690f6a6 Binary files /dev/null and b/econml/data/input_dynamicdgp/n_1.jbl differ diff --git a/econml/data/input_dynamicdgp/n_2.jbl b/econml/data/input_dynamicdgp/n_2.jbl new file mode 100644 index 000000000..d9a912dda Binary files /dev/null and b/econml/data/input_dynamicdgp/n_2.jbl differ diff --git a/econml/data/input_dynamicdgp/n_3.jbl b/econml/data/input_dynamicdgp/n_3.jbl new file mode 100644 index 000000000..e15e46d30 Binary files /dev/null and b/econml/data/input_dynamicdgp/n_3.jbl differ diff --git a/econml/data/input_dynamicdgp/n_4.jbl b/econml/data/input_dynamicdgp/n_4.jbl new file mode 100644 index 000000000..de5e2691a Binary files /dev/null and b/econml/data/input_dynamicdgp/n_4.jbl differ diff --git a/econml/data/input_dynamicdgp/n_5.jbl b/econml/data/input_dynamicdgp/n_5.jbl new file mode 100644 index 000000000..e60b88978 Binary files /dev/null and b/econml/data/input_dynamicdgp/n_5.jbl differ diff --git a/econml/data/input_dynamicdgp/n_6.jbl b/econml/data/input_dynamicdgp/n_6.jbl new file mode 100644 index 000000000..e9f7e305d Binary files /dev/null and b/econml/data/input_dynamicdgp/n_6.jbl differ diff --git a/econml/dml/__init__.py b/econml/dml/__init__.py index a88d3b693..a83428cf7 100644 --- a/econml/dml/__init__.py +++ b/econml/dml/__init__.py @@ -32,7 +32,6 @@ .. [ortholearner] Dylan Foster, Vasilis Syrgkanis (2019). Orthogonal Statistical Learning. ACM Conference on Learning Theory. ``_ - """ from .dml import (DML, LinearDML, SparseLinearDML, @@ -45,4 +44,4 @@ "KernelDML", "NonParamDML", "ForestDML", - "CausalForestDML", ] + "CausalForestDML"] diff --git a/econml/dml/_rlearner.py b/econml/dml/_rlearner.py index beca5085f..ebdbe1fda 100644 --- a/econml/dml/_rlearner.py +++ b/econml/dml/_rlearner.py @@ -91,7 +91,8 @@ class _ModelFinal: def __init__(self, model_final): self._model_final = model_final - def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): Y_res, T_res = nuisances self._model_final.fit(X, T, T_res, Y_res, sample_weight=sample_weight, freq_weight=freq_weight, sample_var=sample_var) @@ -100,7 +101,7 @@ def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, def predict(self, X=None): return self._model_final.predict(X) - def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None): + def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None): Y_res, T_res = nuisances if Y_res.ndim == 1: Y_res = Y_res.reshape((-1, 1)) diff --git a/econml/dml/causal_forest.py b/econml/dml/causal_forest.py index 4c08df103..efe040c4c 100644 --- a/econml/dml/causal_forest.py +++ b/econml/dml/causal_forest.py @@ -53,7 +53,7 @@ def _ate_and_stderr(self, drpreds, mask=None): stderr = (np.nanstd(drpreds, axis=0) / np.sqrt(nonnan)).reshape(self._d_y + self._d_t) return point, stderr - def fit(self, X, T, T_res, Y_res, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, X, T, T_res, Y_res, sample_weight=None, freq_weight=None, sample_var=None, groups=None): # Track training dimensions to see if Y or T is a vector instead of a 2-dimensional array self._d_t = shape(T_res)[1:] self._d_y = shape(Y_res)[1:] diff --git a/econml/dml/dml.py b/econml/dml/dml.py index ce6e95e0b..e87d956ca 100644 --- a/econml/dml/dml.py +++ b/econml/dml/dml.py @@ -134,7 +134,7 @@ def _combine(self, X, T, fitting=True): F = np.ones((T.shape[0], 1)) return cross_product(F, T) - def fit(self, X, T, T_res, Y_res, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, X, T, T_res, Y_res, sample_weight=None, freq_weight=None, sample_var=None, groups=None): # Track training dimensions to see if Y or T is a vector instead of a 2-dimensional array self._d_t = shape(T_res)[1:] self._d_y = shape(Y_res)[1:] diff --git a/econml/dr/_drlearner.py b/econml/dr/_drlearner.py index 2986439f9..41e206263 100644 --- a/econml/dr/_drlearner.py +++ b/econml/dr/_drlearner.py @@ -126,7 +126,8 @@ def __init__(self, model_final, featurizer, multitask_model_final): self._multitask_model_final = multitask_model_final return - def fit(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, *, nuisances, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): Y_pred, propensities = nuisances self.d_y = Y_pred.shape[1:-1] # track whether there's a Y dimension (must be a singleton) self.d_t = Y_pred.shape[-1] - 1 # track # of treatment (exclude baseline treatment) @@ -163,7 +164,7 @@ def predict(self, X=None): preds = np.array([mdl.predict(X).reshape((-1,) + self.d_y) for mdl in self.models_cate]) return np.moveaxis(preds, 0, -1) # move treatment dim to end - def score(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None): + def score(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None, groups=None): if (X is not None) and (self._featurizer is not None): X = self._featurizer.transform(X) Y_pred, _ = nuisances diff --git a/econml/dynamic/__init__.py b/econml/dynamic/__init__.py new file mode 100755 index 000000000..8e4ecd538 --- /dev/null +++ b/econml/dynamic/__init__.py @@ -0,0 +1,4 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +__all__ = ["dml"] diff --git a/econml/dynamic/dml/__init__.py b/econml/dynamic/dml/__init__.py new file mode 100755 index 000000000..a95579da7 --- /dev/null +++ b/econml/dynamic/dml/__init__.py @@ -0,0 +1,20 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +"""Double Machine Learning for Dynamic Treatment Effects. + +A Double/Orthogonal machine learning approach to estimation of heterogeneous +treatment effect in the dynamic treatment regime. For the theoretical +foundations of these methods see: [dynamicdml]_. + +References +---------- + +.. [dynamicdml] Greg Lewis and Vasilis Syrgkanis. + Double/Debiased Machine Learning for Dynamic Treatment Effects. + ``_, 2021. +""" + +from ._dml import DynamicDML + +__all__ = ["DynamicDML"] diff --git a/econml/dynamic/dml/_dml.py b/econml/dynamic/dml/_dml.py new file mode 100644 index 000000000..9453b16e5 --- /dev/null +++ b/econml/dynamic/dml/_dml.py @@ -0,0 +1,795 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +import abc +import numpy as np +from warnings import warn +from sklearn.base import clone +from sklearn.model_selection import GroupKFold +from scipy.stats import norm +from sklearn.linear_model import (ElasticNetCV, LassoCV, LogisticRegressionCV) +from ...sklearn_extensions.linear_model import (StatsModelsLinearRegression, WeightedLassoCVWrapper) +from ...sklearn_extensions.model_selection import WeightedStratifiedKFold +from ...dml.dml import _FirstStageWrapper, _FinalWrapper +from ..._cate_estimator import TreatmentExpansionMixin, LinearModelFinalCateEstimatorMixin +from ..._ortho_learner import _OrthoLearner +from ...utilities import (_deprecate_positional, add_intercept, + broadcast_unit_treatments, check_high_dimensional, + cross_product, deprecated, fit_with_groups, + hstack, inverse_onehot, ndim, reshape, + reshape_treatmentwise_effects, shape, transpose, + get_feature_names_or_default, check_input_arrays, + filter_none_kwargs) + + +def _get_groups_period_filter(groups, n_periods): + group_counts = {} + group_period_filter = {i: [] for i in range(n_periods)} + for i, g in enumerate(groups): + if g not in group_counts: + group_counts[g] = 0 + group_period_filter[group_counts[g]].append(i) + group_counts[g] += 1 + return group_period_filter + + +class _DynamicModelNuisance: + """ + Nuisance model fits the model_y and model_t at fit time and at predict time + calculates the residual Y and residual T based on the fitted models and returns + the residuals as two nuisance parameters. + """ + + def __init__(self, model_y, model_t, n_periods): + self._model_y = model_y + self._model_t = model_t + self.n_periods = n_periods + + def fit(self, Y, T, X=None, W=None, sample_weight=None, groups=None): + """Fit a series of nuisance models for each period or period pairs.""" + assert Y.shape[0] % self.n_periods == 0, \ + "Length of training data should be an integer multiple of time periods." + period_filters = _get_groups_period_filter(groups, self.n_periods) + self._model_y_trained = {} + self._model_t_trained = {j: {} for j in np.arange(self.n_periods)} + for t in np.arange(self.n_periods): + self._model_y_trained[t] = clone(self._model_y, safe=False).fit( + self._index_or_None(X, period_filters[t]), + self._index_or_None( + W, period_filters[t]), + Y[period_filters[self.n_periods - 1]]) + for j in np.arange(t, self.n_periods): + self._model_t_trained[j][t] = clone(self._model_t, safe=False).fit( + self._index_or_None(X, period_filters[t]), + self._index_or_None(W, period_filters[t]), + T[period_filters[j]]) + return self + + def predict(self, Y, T, X=None, W=None, sample_weight=None, groups=None): + """Calculate nuisances for each period or period pairs. + + Returns + ------- + Y_res : (n, d_y) matrix or vector of length n + Y residuals for each period in panel format. + This shape is required for _OrthoLearner's crossfitting. + T_res : (n, d_t, n_periods) matrix + T residuals for pairs of periods (t, j), where the data is in panel format for t + and in index form for j. For example, the residuals for (t, j) can be retrieved via + T_res[np.arange(n) % n_periods == t, ..., j]. For t < j, the entries of this + matrix are np.nan. + This shape is required for _OrthoLearner's crossfitting. + """ + assert Y.shape[0] % self.n_periods == 0, \ + "Length of training data should be an integer multiple of time periods." + period_filters = _get_groups_period_filter(groups, self.n_periods) + Y_res = np.full(Y.shape, np.nan) + T_res = np.full(T.shape + (self.n_periods, ), np.nan) + shape_formatter = self._get_shape_formatter(X, W) + for t in np.arange(self.n_periods): + Y_slice = Y[period_filters[self.n_periods - 1]] + Y_pred = self._model_y_trained[t].predict( + self._index_or_None(X, period_filters[t]), + self._index_or_None(W, period_filters[t])) + Y_res[period_filters[t]] = Y_slice\ + - shape_formatter(Y_slice, Y_pred) + for j in np.arange(t, self.n_periods): + T_slice = T[period_filters[j]] + T_pred = self._model_t_trained[j][t].predict( + self._index_or_None(X, period_filters[t]), + self._index_or_None(W, period_filters[t])) + T_res[period_filters[j], ..., t] = T_slice\ + - shape_formatter(T_slice, T_pred) + return Y_res, T_res + + def score(self, Y, T, X=None, W=None, sample_weight=None, groups=None): + assert Y.shape[0] % self.n_periods == 0, \ + "Length of training data should be an integer multiple of time periods." + period_filters = _get_groups_period_filter(groups, self.n_periods) + if hasattr(self._model_y, 'score'): + Y_score = np.full((self.n_periods, ), np.nan) + for t in np.arange(self.n_periods): + Y_score[t] = self._model_y_trained[t].score( + self._index_or_None(X, period_filters[t]), + self._index_or_None(W, period_filters[t]), + Y[period_filters[self.n_periods - 1]]) + else: + Y_score = None + if hasattr(self._model_t, 'score'): + T_score = np.full((self.n_periods, self.n_periods), np.nan) + for t in np.arange(self.n_periods): + for j in np.arange(t, self.n_periods): + T_score[j][t] = self._model_t_trained[j][t].score( + self._index_or_None(X, period_filters[t]), + self._index_or_None(W, period_filters[t]), + T[period_filters[j]]) + else: + T_score = None + return Y_score, T_score + + def _get_shape_formatter(self, X, W): + if (X is None) and (W is None): + return lambda x, x_pred: np.tile(x_pred.reshape(1, -1), (x.shape[0], 1)).reshape(x.shape) + return lambda x, x_pred: x_pred.reshape(x.shape) + + def _index_or_None(self, X, filter_idx): + return None if X is None else X[filter_idx] + + +class _DynamicModelFinal: + """ + Final model at fit time, fits a residual on residual regression with a heterogeneous coefficient + that depends on X, i.e. + + .. math :: + Y - E[Y | X, W] = \\theta(X) \\cdot (T - E[T | X, W]) + \\epsilon + + and at predict time returns :math:`\\theta(X)`. The score method returns the MSE of this final + residual on residual regression. + Assumes model final is parametric with no intercept. + """ + # TODO: update docs + + def __init__(self, model_final, n_periods): + self._model_final = model_final + self.n_periods = n_periods + self._model_final_trained = {k: clone(self._model_final, safe=False) for k in np.arange(n_periods)} + + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, sample_var=None, groups=None): + # NOTE: sample weight, sample var are not passed in + period_filters = _get_groups_period_filter(groups, self.n_periods) + Y_res, T_res = nuisances + self._d_y = Y.shape[1:] + for t in np.arange(self.n_periods - 1, -1, -1): + Y_adj = Y_res[period_filters[t]].copy() + if t < self.n_periods - 1: + Y_adj -= np.sum( + [self._model_final_trained[j].predict_with_res( + X[period_filters[0]] if X is not None else None, + T_res[period_filters[j], ..., t] + ) for j in np.arange(t + 1, self.n_periods)], axis=0) + self._model_final_trained[t].fit( + X[period_filters[0]] if X is not None else None, T[period_filters[t]], + T_res[period_filters[t], ..., t], Y_adj) + + return self + + def predict(self, X=None): + """ + Return shape: m x dy x (p*dt) + """ + d_t_tuple = self._model_final_trained[0]._d_t + d_t = d_t_tuple[0] if d_t_tuple else 1 + x_dy_shape = (X.shape[0] if X is not None else 1, ) + \ + self._model_final_trained[0]._d_y + preds = np.zeros( + x_dy_shape + + (self.n_periods * d_t, ) + ) + for t in range(self.n_periods): + preds[..., t * d_t: (t + 1) * d_t] = \ + self._model_final_trained[t].predict(X).reshape( + x_dy_shape + (d_t, ) + ) + return preds + + def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, sample_var=None, groups=None): + assert Y.shape[0] % self.n_periods == 0, \ + "Length of training data should be an integer multiple of time periods." + Y_res, T_res = nuisances + scores = np.full((self.n_periods, ), np.nan) + period_filters = _get_groups_period_filter(groups, self.n_periods) + for t in np.arange(self.n_periods - 1, -1, -1): + Y_adj = Y_res[period_filters[t]].copy() + if t < self.n_periods - 1: + Y_adj -= np.sum( + [self._model_final_trained[j].predict_with_res( + X[period_filters[0]] if X is not None else None, + T_res[period_filters[j], ..., t] + ) for j in np.arange(t + 1, self.n_periods)], axis=0) + Y_adj_pred = self._model_final_trained[t].predict_with_res( + X[period_filters[0]] if X is not None else None, + T_res[period_filters[t], ..., t]) + if sample_weight is not None: + scores[t] = np.mean(np.average((Y_adj - Y_adj_pred)**2, weights=sample_weight, axis=0)) + else: + scores[t] = np.mean((Y_adj - Y_adj_pred) ** 2) + return scores + + +class _LinearDynamicModelFinal(_DynamicModelFinal): + """Wrapper for the DynamicModelFinal with StatsModelsLinearRegression final model. + + The final model is a linear model with (d_t*n_periods) coefficients. + This model is defined after the coefficients and covariance are calculated. + """ + + def __init__(self, model_final, n_periods): + super().__init__(model_final, n_periods) + self.model_final_ = StatsModelsLinearRegression(fit_intercept=False) + + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, sample_var=None, groups=None): + super().fit(Y, T, X=X, W=W, Z=Z, nuisances=nuisances, + sample_weight=sample_weight, sample_var=sample_var, groups=groups) + # Compose final model + cov = self._get_cov(nuisances, X, groups) + coef = self._get_coef_() + self.model_final_._n_out = self._d_y[0] if self._d_y else 0 + self.model_final_._param_var = cov / (Y.shape[0] / self.n_periods) + self.model_final_._param = coef.T if self.model_final_._n_out else coef + + def _get_coef_(self): + period_coefs = np.array([self._model_final_trained[t]._model.coef_ for t in range(self.n_periods)]) + if self._d_y: + return np.array([ + np.array([period_coefs[k, i, :] for k in range(self.n_periods)]).flatten() + for i in range(self._d_y[0]) + ]) + return period_coefs.flatten() + + def _get_cov(self, nuisances, X, groups): + if self._d_y: + return np.array( + [self._fit_single_output_cov((nuisances[0][:, i], nuisances[1]), X, i, groups) + for i in range(self._d_y[0])] + ) + return self._fit_single_output_cov(nuisances, X, -1, groups) + + def _fit_single_output_cov(self, nuisances, X, y_index, groups): + """ Calculates the covariance (n_periods*n_treatments) + x (n_periods*n_treatments) matrix for a single outcome. + """ + Y_res, T_res = nuisances + # Calculate auxiliary quantities + period_filters = _get_groups_period_filter(groups, self.n_periods) + # X ⨂ T_res + XT_res = np.array([ + [ + self._model_final_trained[0]._combine( + X[period_filters[0]] if X is not None else None, + T_res[period_filters[t], ..., j], + fitting=False + ) + for j in range(self.n_periods) + ] + for t in range(self.n_periods) + ]) + d_xt = XT_res.shape[-1] + # sum(model_final.predict(X, T_res)) + Y_diff = np.array([ + np.sum([ + self._model_final_trained[j].predict_with_res( + X[period_filters[0]] if X is not None else None, + T_res[period_filters[j], ..., t] + ) for j in np.arange(t, self.n_periods)], + axis=0 + ) + for t in np.arange(self.n_periods) + ]) + J = np.zeros((self.n_periods * d_xt, + self.n_periods * d_xt)) + Sigma = np.zeros((self.n_periods * d_xt, + self.n_periods * d_xt)) + for t in np.arange(self.n_periods): + res_epsilon_t = (Y_res[period_filters[t]] - + (Y_diff[t][:, y_index] if y_index >= 0 else Y_diff[t]) + ).reshape(-1, 1, 1) + resT_t = XT_res[t][t] + for j in np.arange(self.n_periods): + # Calculating the (t, j) block entry (of size n_treatments x n_treatments) of matrix Sigma + res_epsilon_j = (Y_res[period_filters[j]] - + (Y_diff[j][:, y_index] if y_index >= 0 else Y_diff[j]) + ).reshape(-1, 1, 1) + resT_j = XT_res[j][j] + cov_resT_tj = resT_t.reshape(-1, d_xt, 1) @ resT_j.reshape(-1, 1, d_xt) + sigma_tj = np.mean((res_epsilon_t * res_epsilon_j) * cov_resT_tj, axis=0) + Sigma[t * d_xt:(t + 1) * d_xt, + j * d_xt:(j + 1) * d_xt] = sigma_tj + if j >= t: + # Calculating the (t, j) block entry (of size n_treatments x n_treatments) of matrix J + m_tj = np.mean( + XT_res[j][t].reshape(-1, d_xt, 1) @ resT_t.reshape(-1, 1, d_xt), + axis=0) + J[t * d_xt:(t + 1) * d_xt, + j * d_xt:(j + 1) * d_xt] = m_tj + return np.linalg.inv(J) @ Sigma @ np.linalg.inv(J).T + + +class _DynamicFinalWrapper(_FinalWrapper): + + def predict_with_res(self, X, T_res): + fts = self._combine(X, T_res, fitting=False) + prediction = self._model.predict(fts) + if self._intercept is not None: + prediction -= self._intercept + return reshape(prediction, (prediction.shape[0],) + self._d_y) + + +class DynamicDML(LinearModelFinalCateEstimatorMixin, _OrthoLearner): + """CATE estimator for dynamic treatment effect estimation. + + This estimator is an extension of the Double ML approach for treatments assigned sequentially + over time periods. + + The estimator is a special case of an :class:`_OrthoLearner` estimator, so it follows the two + stage process, where a set of nuisance functions are estimated in the first stage in a crossfitting + manner and a final stage estimates the CATE model. See the documentation of + :class:`._OrthoLearner` for a description of this two stage process. + + Parameters + ---------- + model_y: estimator or 'auto', optional (default is 'auto') + The estimator for fitting the response to the features. Must implement + `fit` and `predict` methods. + If 'auto' :class:`.WeightedLassoCV`/:class:`.WeightedMultiTaskLassoCV` will be chosen. + + model_t: estimator or 'auto', optional (default is 'auto') + The estimator for fitting the treatment to the features. + If estimator, it must implement `fit` and `predict` methods; + If 'auto', :class:`~sklearn.linear_model.LogisticRegressionCV` will be applied for discrete treatment, + and :class:`.WeightedLassoCV`/:class:`.WeightedMultiTaskLassoCV` + will be applied for continuous treatment. + + featurizer : :term:`transformer`, optional, default None + Must support fit_transform and transform. Used to create composite features in the final CATE regression. + It is ignored if X is None. The final CATE will be trained on the outcome of featurizer.fit_transform(X). + If featurizer=None, then CATE is trained on X. + + fit_cate_intercept : bool, optional, default True + Whether the linear CATE model should have a constant term. + + linear_first_stages: bool + Whether the first stage models are linear (in which case we will expand the features passed to + `model_y` accordingly) + + discrete_treatment: bool, optional (default is ``False``) + Whether the treatment values should be treated as categorical, rather than continuous, quantities + + categories: 'auto' or list, default 'auto' + The categories to use when encoding discrete treatments (or 'auto' to use the unique sorted values). + The first category will be treated as the control treatment. + + cv: int, cross-validation generator or an iterable, optional (Default=2) + Determines the cross-validation splitting strategy. + Possible inputs for cv are: + + - None, to use the default 3-fold cross-validation, + - integer, to specify the number of folds. + - :term:`CV splitter` + - An iterable yielding (train, test) splits as arrays of indices. + Iterables should make sure a group belongs to a single split. + + For integer/None inputs, :class:`~sklearn.model_selection.GroupKFold` is used + + Unless an iterable is used, we call `split(X, T, groups)` to generate the splits. + + mc_iters: int, optional (default=None) + The number of times to rerun the first stage models to reduce the variance of the nuisances. + + mc_agg: {'mean', 'median'}, optional (default='mean') + How to aggregate the nuisance value for each sample across the `mc_iters` monte carlo iterations of + cross-fitting. + + random_state: int, :class:`~numpy.random.mtrand.RandomState` instance or None, optional (default=None) + If int, random_state is the seed used by the random number generator; + If :class:`~numpy.random.mtrand.RandomState` instance, random_state is the random number generator; + If None, the random number generator is the :class:`~numpy.random.mtrand.RandomState` instance used + by :mod:`np.random`. + + Examples + -------- + A simple example with default models: + + .. testcode:: + :hide: + + import numpy as np + np.set_printoptions(suppress=True) + + .. testcode:: + + from econml.dynamic.dml import DynamicDML + + np.random.seed(123) + + n_panels = 100 # number of panels + n_periods = 3 # number of time periods per panel + n = n_panels * n_periods + groups = np.repeat(a=np.arange(n_panels), repeats=n_periods, axis=0) + X = np.random.normal(size=(n, 1)) + T = np.random.normal(size=(n, 2)) + y = np.random.normal(size=(n, )) + est = DynamicDML() + est.fit(y, T, X=X, W=None, groups=groups, inference="auto") + + >>> est.const_marginal_effect(X[:2]) + array([[-0.336..., -0.048..., -0.061..., 0.042..., -0.204..., + 0.00667271], + [-0.101..., 0.433..., 0.054..., -0.217..., -0.101..., + -0.159...]]) + >>> est.effect(X[:2], T0=0, T1=1) + array([-0.601..., -0.091...]) + >>> est.effect(X[:2], T0=np.zeros((2, n_periods*T.shape[1])), T1=np.ones((2, n_periods*T.shape[1]))) + array([-0.601..., -0.091...]) + >>> est.coef_ + array([[ 0.112...], + [ 0.231...], + [ 0.055...], + [-0.125...], + [ 0.049...], + [-0.079...]]) + >>> est.coef__interval() + (array([[-0.035...], + [ 0.029...], + [-0.087... ], + [-0.366... ], + [-0.090...], + [-0.233...]]), + array([[0.260...], + [0.433... ], + [0.198...], + [0.116...], + [0.189...], + [0.074...]])) + """ + + def __init__(self, *, + model_y='auto', model_t='auto', + featurizer=None, + fit_cate_intercept=True, + linear_first_stages=False, + discrete_treatment=False, + categories='auto', + cv=2, + mc_iters=None, + mc_agg='mean', + random_state=None): + self.fit_cate_intercept = fit_cate_intercept + self.linear_first_stages = linear_first_stages + self.featurizer = clone(featurizer, safe=False) + self.model_y = clone(model_y, safe=False) + self.model_t = clone(model_t, safe=False) + super().__init__(discrete_treatment=discrete_treatment, + discrete_instrument=False, + categories=categories, + cv=GroupKFold(cv) if isinstance(cv, int) else cv, + mc_iters=mc_iters, + mc_agg=mc_agg, + random_state=random_state) + + def _gen_featurizer(self): + return clone(self.featurizer, safe=False) + + def _gen_model_y(self): + if self.model_y == 'auto': + model_y = WeightedLassoCVWrapper(random_state=self.random_state) + else: + model_y = clone(self.model_y, safe=False) + return _FirstStageWrapper(model_y, True, self._gen_featurizer(), + self.linear_first_stages, self.discrete_treatment) + + def _gen_model_t(self): + if self.model_t == 'auto': + if self.discrete_treatment: + model_t = LogisticRegressionCV(cv=WeightedStratifiedKFold(random_state=self.random_state), + random_state=self.random_state) + else: + model_t = WeightedLassoCVWrapper(random_state=self.random_state) + else: + model_t = clone(self.model_t, safe=False) + return _FirstStageWrapper(model_t, False, self._gen_featurizer(), + self.linear_first_stages, self.discrete_treatment) + + def _gen_model_final(self): + return StatsModelsLinearRegression(fit_intercept=False) + + def _gen_ortho_learner_model_nuisance(self, n_periods): + return _DynamicModelNuisance( + model_t=self._gen_model_t(), + model_y=self._gen_model_y(), + n_periods=n_periods) + + def _gen_ortho_learner_model_final(self, n_periods): + wrapped_final_model = _DynamicFinalWrapper( + StatsModelsLinearRegression(fit_intercept=False), + fit_cate_intercept=self.fit_cate_intercept, + featurizer=self.featurizer, + use_weight_trick=False) + return _LinearDynamicModelFinal(wrapped_final_model, n_periods=n_periods) + + def _prefit(self, Y, T, *args, groups=None, only_final=False, **kwargs): + u_periods = np.unique(np.unique(groups, return_counts=True)[1]) + if len(u_periods) > 1: + raise AttributeError( + "Imbalanced panel. Method currently expects only panels with equal number of periods. Pad your data") + self._n_periods = u_periods[0] + # generate an instance of the final model + self._ortho_learner_model_final = self._gen_ortho_learner_model_final(self._n_periods) + if not only_final: + # generate an instance of the nuisance model + self._ortho_learner_model_nuisance = self._gen_ortho_learner_model_nuisance(self._n_periods) + TreatmentExpansionMixin._prefit(self, Y, T, *args, **kwargs) + + def _postfit(self, Y, T, *args, **kwargs): + super()._postfit(Y, T, *args, **kwargs) + # Set _d_t to effective number of treatments + self._d_t = (self._n_periods * self._d_t[0], ) if self._d_t else (self._n_periods, ) + + def _strata(self, Y, T, X=None, W=None, Z=None, + sample_weight=None, sample_var=None, groups=None, + cache_values=False, only_final=False, check_input=True): + # Required for bootstrap inference + return groups + + @_deprecate_positional("X, and should be passed by keyword only. In a future release " + "we will disallow passing X and W by position.", ['X', 'W']) + def fit(self, Y, T, X=None, W=None, *, sample_weight=None, sample_var=None, groups, + cache_values=False, inference='auto'): + """Estimate the counterfactual model from data, i.e. estimates function :math:`\\theta(\\cdot)`. + + The input data must contain groups with the same size corresponding to the number + of time periods the treatments were assigned over. + + The data should be preferably in panel format, with groups clustered together. + If group members do not appear together, the following is assumed: + + * the first instance of a group in the dataset is assumed to correspond to the first period of that group + * the second instance of a group in the dataset is assumed to correspond to the + second period of that group + + ...etc. + + Only the value of the features X at the first period of each unit are used for + heterogeneity. The value of X in subseuqnet periods is used as a time-varying control + but not for heterogeneity. + + Parameters + ---------- + Y: (n, d_y) matrix or vector of length n + Outcomes for each sample (required: n = n_groups * n_periods) + T: (n, d_t) matrix or vector of length n + Treatments for each sample (required: n = n_groups * n_periods) + X: optional(n, d_x) matrix or None (Default=None) + Features for each sample (Required: n = n_groups * n_periods). Only first + period features from each unit are used for heterogeneity, the rest are + used as time-varying controls together with W + W: optional(n, d_w) matrix or None (Default=None) + Controls for each sample (Required: n = n_groups * n_periods) + sample_weight: optional(n,) vector or None (Default=None) + Weights for each samples + sample_var: optional(n,) vector or None (Default=None) + Sample variance for each sample + groups: (n,) vector, required + All rows corresponding to the same group will be kept together during splitting. + If groups is not None, the `cv` argument passed to this class's initializer + must support a 'groups' argument to its split method. + cache_values: bool, default False + Whether to cache inputs and first stage results, which will allow refitting a different final model + inference: string,:class:`.Inference` instance, or None + Method for performing inference. This estimator supports 'bootstrap' + (or an instance of :class:`.BootstrapInference`) and 'auto' + (or an instance of :class:`.LinearModelFinalInference`). + + Returns + ------- + self: DynamicDML instance + """ + if sample_weight is not None or sample_var is not None: + warn("This CATE estimator does not yet support sample weights and sample variance. " + "These inputs will be ignored during fitting.", + UserWarning) + return super().fit(Y, T, X=X, W=W, + sample_weight=None, sample_var=None, groups=groups, + cache_values=cache_values, + inference=inference) + + def score(self, Y, T, X=None, W=None, sample_weight=None, *, groups): + """ + Score the fitted CATE model on a new data set. Generates nuisance parameters + for the new data set based on the fitted residual nuisance models created at fit time. + It uses the mean prediction of the models fitted by the different crossfit folds. + Then calculates the MSE of the final residual Y on residual T regression. + + If model_final does not have a score method, then it raises an :exc:`.AttributeError` + + Parameters + ---------- + Y: (n, d_y) matrix or vector of length n + Outcomes for each sample (required: n = n_groups * n_periods) + T: (n, d_t) matrix or vector of length n + Treatments for each sample (required: n = n_groups * n_periods) + X: optional(n, d_x) matrix or None (Default=None) + Features for each sample (Required: n = n_groups * n_periods) + W: optional(n, d_w) matrix or None (Default=None) + Controls for each sample (Required: n = n_groups * n_periods) + groups: (n,) vector, required + All rows corresponding to the same group will be kept together during splitting. + + Returns + ------- + score: float + The MSE of the final CATE model on the new data. + """ + if not hasattr(self._ortho_learner_model_final, 'score'): + raise AttributeError("Final model does not have a score method!") + Y, T, X, W, groups = check_input_arrays(Y, T, X, W, groups) + self._check_fitted_dims(X) + X, T = super()._expand_treatments(X, T) + n_iters = len(self._models_nuisance) + n_splits = len(self._models_nuisance[0]) + + # for each mc iteration + for i, models_nuisances in enumerate(self._models_nuisance): + # for each model under cross fit setting + for j, mdl in enumerate(models_nuisances): + nuisance_temp = mdl.predict(Y, T, **filter_none_kwargs(X=X, W=W, groups=groups)) + if not isinstance(nuisance_temp, tuple): + nuisance_temp = (nuisance_temp,) + + if i == 0 and j == 0: + nuisances = [np.zeros((n_iters * n_splits,) + nuis.shape) for nuis in nuisance_temp] + + for it, nuis in enumerate(nuisance_temp): + nuisances[it][i * n_iters + j] = nuis + + for it in range(len(nuisances)): + nuisances[it] = np.mean(nuisances[it], axis=0) + return self._ortho_learner_model_final.score(Y, T, nuisances=nuisances, + **filter_none_kwargs(X=X, W=W, + sample_weight=sample_weight, groups=groups)) + + def cate_treatment_names(self, treatment_names=None): + """ + Get treatment names for each time period. + + If the treatment is discrete, it will return expanded treatment names. + + Parameters + ---------- + treatment_names: list of strings of length T.shape[1] or None + The names of the treatments. If None and the T passed to fit was a dataframe, + it defaults to the column names from the dataframe. + + Returns + ------- + out_treatment_names: list of strings + Returns (possibly expanded) treatment names. + """ + slice_treatment_names = super().cate_treatment_names(treatment_names) + treatment_names_out = [] + for k in range(self._n_periods): + treatment_names_out += [f"({t})$_{k}$" for t in slice_treatment_names] + return treatment_names_out + + def cate_feature_names(self, feature_names=None): + """ + Get the output feature names. + + Parameters + ---------- + feature_names: list of strings of length X.shape[1] or None + The names of the input features. If None and X is a dataframe, it defaults to the column names + from the dataframe. + + Returns + ------- + out_feature_names: list of strings or None + The names of the output features :math:`\\phi(X)`, i.e. the features with respect to which the + final constant marginal CATE model is linear. It is the names of the features that are associated + with each entry of the :meth:`coef_` parameter. Not available when the featurizer is not None and + does not have a method: `get_feature_names(feature_names)`. Otherwise None is returned. + """ + if self._d_x is None: + # Handles the corner case when X=None but featurizer might be not None + return None + if feature_names is None: + feature_names = self._input_names["feature_names"] + if self.original_featurizer is None: + return feature_names + return get_feature_names_or_default(self.original_featurizer, feature_names) + + def _expand_treatments(self, X, *Ts): + # Expand treatments for each time period + outTs = [] + base_expand_treatments = super()._expand_treatments + for T in Ts: + if ndim(T) == 0: + one_T = base_expand_treatments(X, T)[1] + one_T = one_T.reshape(-1, 1) if ndim(one_T) == 1 else one_T + T = np.tile(one_T, (1, self._n_periods, )) + else: + assert (T.shape[1] == self._n_periods if self.transformer else T.shape[1] == self._d_t[0]), \ + f"Expected a list of time period * d_t, instead got a treatment array of shape {T.shape}." + if self.transformer: + T = np.hstack([ + base_expand_treatments( + X, T[:, [t]])[1] for t in range(self._n_periods) + ]) + outTs.append(T) + return (X,) + tuple(outTs) + + @property + def bias_part_of_coef(self): + return self.ortho_learner_model_final_._model_final._fit_cate_intercept + + @property + def fit_cate_intercept_(self): + return self.ortho_learner_model_final_._model_final._fit_cate_intercept + + @property + def original_featurizer(self): + # NOTE: important to use the _ortho_learner_model_final_ attribute instead of the + # attribute so that the trained featurizer will be passed through + return self.ortho_learner_model_final_._model_final_trained[0]._original_featurizer + + @property + def featurizer_(self): + # NOTE This is used by the inference methods and has to be the overall featurizer. intended + # for internal use by the library + return self.ortho_learner_model_final_._model_final_trained[0]._featurizer + + @property + def model_final_(self): + # NOTE This is used by the inference methods and is more for internal use to the library + # We need to use the _ortho_learner's copy to retain the information from fitting + return self.ortho_learner_model_final_.model_final_ + + @property + def model_final(self): + return self._gen_model_final() + + @model_final.setter + def model_final(self, model): + if model is not None: + raise ValueError("Parameter `model_final` cannot be altered for this estimator!") + + @property + def models_y(self): + return [[mdl._model_y for mdl in mdls] for mdls in super().models_nuisance_] + + @property + def models_t(self): + return [[mdl._model_t for mdl in mdls] for mdls in super().models_nuisance_] + + @property + def nuisance_scores_y(self): + return self.nuisance_scores_[0] + + @property + def nuisance_scores_t(self): + return self.nuisance_scores_[1] + + @property + def residuals_(self): + """ + A tuple (y_res, T_res, X, W), of the residuals from the first stage estimation + along with the associated X and W. Samples are not guaranteed to be in the same + order as the input order. + """ + if not hasattr(self, '_cached_values'): + raise AttributeError("Estimator is not fitted yet!") + if self._cached_values is None: + raise AttributeError("`fit` was called with `cache_values=False`. " + "Set to `True` to enable residual storage.") + Y_res, T_res = self._cached_values.nuisances + return Y_res, T_res, self._cached_values.X, self._cached_values.W diff --git a/econml/iv/dml/_dml.py b/econml/iv/dml/_dml.py index 8599b8fad..7ad1886d0 100644 --- a/econml/iv/dml/_dml.py +++ b/econml/iv/dml/_dml.py @@ -33,7 +33,8 @@ def __init__(self): self._model_final = _FinalWrapper(LinearRegression(fit_intercept=False), fit_cate_intercept=True, featurizer=None, use_weight_trick=False) - def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): Y_res, T_res, Z_res = nuisances if Z_res.ndim == 1: Z_res = Z_res.reshape(-1, 1) @@ -49,7 +50,7 @@ def predict(self, X=None): # TODO: allow the final model to actually use X? return self._model_final.predict(X=None) - def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None): + def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None): Y_res, T_res, Z_res = nuisances if Y_res.ndim == 1: Y_res = Y_res.reshape((-1, 1)) @@ -384,7 +385,8 @@ class _BaseDMLIVModelFinal: def __init__(self, model_final): self._model_final = clone(model_final, safe=False) - def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): Y_res, T_res = nuisances self._model_final.fit(X, T, T_res, Y_res, sample_weight=sample_weight, freq_weight=freq_weight, sample_var=sample_var) @@ -393,7 +395,7 @@ def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, def predict(self, X=None): return self._model_final.predict(X) - def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None): + def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None): Y_res, T_res = nuisances if Y_res.ndim == 1: Y_res = Y_res.reshape((-1, 1)) diff --git a/econml/iv/dr/_dr.py b/econml/iv/dr/_dr.py index 7ecd69fd0..48a68627b 100644 --- a/econml/iv/dr/_dr.py +++ b/econml/iv/dr/_dr.py @@ -75,7 +75,8 @@ def _effect_estimate(self, nuisances): self._cov_clip, np.inf) return prel_theta + (res_y - prel_theta * res_t) * res_z / clipped_cov, clipped_cov - def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, Z=None, nuisances=None, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): self.d_y = Y.shape[1:] self.d_t = nuisances[1].shape[1:] self.d_z = nuisances[3].shape[1:] @@ -115,7 +116,7 @@ def predict(self, X=None): X = self._featurizer.transform(X) return self._model_final.predict(X).reshape((-1,) + self.d_y + self.d_t) - def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None): + def score(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight=None, groups=None): theta_dr, clipped_cov = self._effect_estimate(nuisances) if (X is not None) and (self._featurizer is not None): diff --git a/econml/policy/_drlearner.py b/econml/policy/_drlearner.py index d7a41833a..7280be32c 100644 --- a/econml/policy/_drlearner.py +++ b/econml/policy/_drlearner.py @@ -14,7 +14,8 @@ class _PolicyModelFinal(_ModelFinal): - def fit(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None, freq_weight=None, sample_var=None): + def fit(self, Y, T, X=None, W=None, *, nuisances, + sample_weight=None, freq_weight=None, sample_var=None, groups=None): if sample_var is not None: warn('Parameter `sample_var` is ignored by the final estimator') sample_var = None @@ -38,7 +39,7 @@ def predict(self, X=None): return pred[:, np.newaxis, :] return pred - def score(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None): + def score(self, Y, T, X=None, W=None, *, nuisances, sample_weight=None, groups=None): return 0 diff --git a/econml/tests/dgp.py b/econml/tests/dgp.py new file mode 100644 index 000000000..403783447 --- /dev/null +++ b/econml/tests/dgp.py @@ -0,0 +1,219 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +import abc +import numpy as np +from econml.utilities import cross_product +from statsmodels.tools.tools import add_constant +try: + import matplotlib + import matplotlib.pyplot as plt +except ImportError as exn: + from .utilities import MissingModule + + # make any access to matplotlib or plt throw an exception + matplotlib = plt = MissingModule("matplotlib is no longer a dependency of the main econml package; " + "install econml[plt] or econml[all] to require it, or install matplotlib " + "separately, to use the tree interpreters", exn) + + +class _BaseDynamicPanelDGP: + + def __init__(self, n_periods, n_treatments, n_x): + self.n_periods = n_periods + self.n_treatments = n_treatments + self.n_x = n_x + return + + @abc.abstractmethod + def create_instance(self, *args, **kwargs): + pass + + @abc.abstractmethod + def _gen_data_with_policy(self, n_units, policy_gen, random_seed=123): + pass + + def static_policy_data(self, n_units, tau, random_seed=123): + def policy_gen(Tpre, X, period): + return tau[period] + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + def adaptive_policy_data(self, n_units, policy_gen, random_seed=123): + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + def static_policy_effect(self, tau, mc_samples=1000): + Y_tau, _, _, _ = self.static_policy_data(mc_samples, tau) + Y_zero, _, _, _ = self.static_policy_data( + mc_samples, np.zeros((self.n_periods, self.n_treatments))) + return np.mean(Y_tau[np.arange(Y_tau.shape[0]) % self.n_periods == self.n_periods - 1]) - \ + np.mean(Y_zero[np.arange(Y_zero.shape[0]) % + self.n_periods == self.n_periods - 1]) + + def adaptive_policy_effect(self, policy_gen, mc_samples=1000): + Y_tau, _, _, _ = self.adaptive_policy_data(mc_samples, policy_gen) + Y_zero, _, _, _ = self.static_policy_data( + mc_samples, np.zeros((self.n_periods, self.n_treatments))) + return np.mean(Y_tau[np.arange(Y_tau.shape[0]) % self.n_periods == self.n_periods - 1]) - \ + np.mean(Y_zero[np.arange(Y_zero.shape[0]) % + self.n_periods == self.n_periods - 1]) + + +class DynamicPanelDGP(_BaseDynamicPanelDGP): + + def __init__(self, n_periods, n_treatments, n_x): + super().__init__(n_periods, n_treatments, n_x) + + def create_instance(self, s_x, sigma_x=.8, sigma_y=.1, conf_str=5, hetero_strength=.5, hetero_inds=None, + autoreg=.25, state_effect=.25, random_seed=123): + np.random.seed(random_seed) + self.s_x = s_x + self.conf_str = conf_str + self.sigma_x = sigma_x + self.sigma_y = sigma_y + self.hetero_inds = hetero_inds.astype( + int) if hetero_inds is not None else hetero_inds + self.endo_inds = np.setdiff1d( + np.arange(self.n_x), hetero_inds).astype(int) + # The first s_x state variables are confounders. The final s_x variables are exogenous and can create + # heterogeneity + self.Alpha = np.random.uniform(-1, 1, + size=(self.n_x, self.n_treatments)) + self.Alpha /= np.linalg.norm(self.Alpha, axis=1, ord=1, keepdims=True) + self.Alpha *= state_effect + if self.hetero_inds is not None: + self.Alpha[self.hetero_inds] = 0 + + self.Beta = np.zeros((self.n_x, self.n_x)) + for t in range(self.n_x): + self.Beta[t, :] = autoreg * np.roll(np.random.uniform(low=4.0**(-np.arange( + 0, self.n_x)), high=4.0**(-np.arange(1, self.n_x + 1))), t) + if self.hetero_inds is not None: + self.Beta[np.ix_(self.endo_inds, self.hetero_inds)] = 0 + self.Beta[np.ix_(self.hetero_inds, self.endo_inds)] = 0 + + self.epsilon = np.random.uniform(-1, 1, size=self.n_treatments) + self.zeta = np.zeros(self.n_x) + self.zeta[:self.s_x] = self.conf_str / self.s_x + + self.y_hetero_effect = np.zeros(self.n_x) + self.x_hetero_effect = np.zeros(self.n_x) + if self.hetero_inds is not None: + self.y_hetero_effect[self.hetero_inds] = np.random.uniform(.5 * hetero_strength, 1.5 * hetero_strength) / \ + len(self.hetero_inds) + self.x_hetero_effect[self.hetero_inds] = np.random.uniform(.5 * hetero_strength, 1.5 * hetero_strength) / \ + len(self.hetero_inds) + + self.true_effect = np.zeros((self.n_periods, self.n_treatments)) + # Invert indices to match latest API + self.true_effect[self.n_periods - 1] = self.epsilon + for t in np.arange(self.n_periods - 2, -1, -1): + self.true_effect[t, :] = (self.zeta.reshape( + 1, -1) @ np.linalg.matrix_power(self.Beta, (self.n_periods - 1 - t) - 1) @ self.Alpha) + + self.true_hetero_effect = np.zeros( + (self.n_periods, (self.n_x + 1) * self.n_treatments)) + self.true_hetero_effect[self.n_periods - 1, :] = cross_product( + add_constant(self.y_hetero_effect.reshape(1, -1), has_constant='add'), + self.epsilon.reshape(1, -1)) + for t in np.arange(self.n_periods - 2, -1, -1): + # Invert indices to match latest API + self.true_hetero_effect[t, :] = cross_product( + add_constant(self.x_hetero_effect.reshape(1, -1), has_constant='add'), + self.zeta.reshape(1, -1) @ np.linalg.matrix_power( + self.Beta, (self.n_periods - 1 - t) - 1) @ self.Alpha) + return self + + def hetero_effect_fn(self, t, x): + if t == self.n_periods - 1: + return (np.dot(self.y_hetero_effect, x.flatten()) + 1) * self.epsilon + else: + return (np.dot(self.x_hetero_effect, x.flatten()) + 1) *\ + (self.zeta.reshape(1, -1) @ np.linalg.matrix_power(self.Beta, (self.n_periods - 1 - t) - 1) + @ self.Alpha).flatten() + + def _gen_data_with_policy(self, n_units, policy_gen, random_seed=123): + np.random.seed(random_seed) + Y = np.zeros(n_units * self.n_periods) + T = np.zeros((n_units * self.n_periods, self.n_treatments)) + X = np.zeros((n_units * self.n_periods, self.n_x)) + groups = np.zeros(n_units * self.n_periods) + for t in range(n_units * self.n_periods): + period = t % self.n_periods + if period == 0: + X[t] = np.random.normal(0, self.sigma_x, size=self.n_x) + const_x0 = X[t][self.hetero_inds] + T[t] = policy_gen(np.zeros(self.n_treatments), X[t], period) + else: + X[t] = (np.dot(self.x_hetero_effect, X[t - 1]) + 1) * np.dot(self.Alpha, T[t - 1]) + \ + np.dot(self.Beta, X[t - 1]) + \ + np.random.normal(0, self.sigma_x, size=self.n_x) + # The feature for heterogeneity stays constant + X_t = X[t].copy() + X_t[self.hetero_inds] = const_x0 + T[t] = policy_gen(T[t - 1], X[t], period) + Y[t] = (np.dot(self.y_hetero_effect, X_t if period != 0 else X[t]) + 1) * np.dot(self.epsilon, T[t]) + \ + np.dot(X[t], self.zeta) + \ + np.random.normal(0, self.sigma_y) + groups[t] = t // self.n_periods + + return Y, T, X[:, self.hetero_inds] if (self.hetero_inds is not None) else None, X[:, self.endo_inds], groups + + def observational_data(self, n_units, gamma=0, s_t=1, sigma_t=0.5, random_seed=123): + """Generate observational data with some observational treatment policy parameters. + + Parameters + ---------- + n_units : how many units to observe + gamma : what is the degree of auto-correlation of the treatments across periods + s_t : sparsity of treatment policy; how many states does it depend on + sigma_t : what is the std of the exploration/randomness in the treatment + """ + Delta = np.zeros((self.n_treatments, self.n_x)) + Delta[:, :s_t] = self.conf_str / s_t + + def policy_gen(Tpre, X, period): + return gamma * Tpre + (1 - gamma) * np.dot(Delta, X) + \ + np.random.normal(0, sigma_t, size=self.n_treatments) + return self._gen_data_with_policy(n_units, policy_gen, random_seed=random_seed) + + +# Auxiliary function for adding xticks and vertical lines when plotting results +# for dynamic dml vs ground truth parameters. +def add_vlines(n_periods, n_treatments, hetero_inds): + locs, labels = plt.xticks([], []) + locs += [- .501 + (len(hetero_inds) + 1) / 2] + labels += ["\n\n$\\tau_{{{}}}$".format(0)] + locs += [qx for qx in np.arange(len(hetero_inds) + 1)] + labels += ["$1$"] + ["$x_{{{}}}$".format(qx) for qx in hetero_inds] + for q in np.arange(1, n_treatments): + plt.axvline(x=q * (len(hetero_inds) + 1) - .5, + linestyle='--', color='red', alpha=.2) + locs += [q * (len(hetero_inds) + 1) - .501 + (len(hetero_inds) + 1) / 2] + labels += ["\n\n$\\tau_{{{}}}$".format(q)] + locs += [(q * (len(hetero_inds) + 1) + qx) + for qx in np.arange(len(hetero_inds) + 1)] + labels += ["$1$"] + ["$x_{{{}}}$".format(qx) for qx in hetero_inds] + locs += [- .501 + (len(hetero_inds) + 1) * n_treatments / 2] + labels += ["\n\n\n\n$\\theta_{{{}}}$".format(0)] + for t in np.arange(1, n_periods): + plt.axvline(x=t * (len(hetero_inds) + 1) * + n_treatments - .5, linestyle='-', alpha=.6) + locs += [t * (len(hetero_inds) + 1) * n_treatments - .501 + + (len(hetero_inds) + 1) * n_treatments / 2] + labels += ["\n\n\n\n$\\theta_{{{}}}$".format(t)] + locs += [t * (len(hetero_inds) + 1) * + n_treatments - .501 + (len(hetero_inds) + 1) / 2] + labels += ["\n\n$\\tau_{{{}}}$".format(0)] + locs += [t * (len(hetero_inds) + 1) * n_treatments + + qx for qx in np.arange(len(hetero_inds) + 1)] + labels += ["$1$"] + ["$x_{{{}}}$".format(qx) for qx in hetero_inds] + for q in np.arange(1, n_treatments): + plt.axvline(x=t * (len(hetero_inds) + 1) * n_treatments + q * (len(hetero_inds) + 1) - .5, + linestyle='--', color='red', alpha=.2) + locs += [t * (len(hetero_inds) + 1) * n_treatments + q * + (len(hetero_inds) + 1) - .501 + (len(hetero_inds) + 1) / 2] + labels += ["\n\n$\\tau_{{{}}}$".format(q)] + locs += [t * (len(hetero_inds) + 1) * n_treatments + (q * (len(hetero_inds) + 1) + qx) + for qx in np.arange(len(hetero_inds) + 1)] + labels += ["$1$"] + ["$x_{{{}}}$".format(qx) for qx in hetero_inds] + plt.xticks(locs, labels) + plt.tight_layout() diff --git a/econml/tests/test_dynamic_dml.py b/econml/tests/test_dynamic_dml.py new file mode 100644 index 000000000..7539c18f9 --- /dev/null +++ b/econml/tests/test_dynamic_dml.py @@ -0,0 +1,298 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +import unittest +import pytest +import pickle +import numpy as np +from contextlib import ExitStack +from sklearn.preprocessing import OneHotEncoder, FunctionTransformer, PolynomialFeatures +from sklearn.linear_model import (LinearRegression, LassoCV, Lasso, MultiTaskLasso, + MultiTaskLassoCV, LogisticRegression) +from econml.dynamic.dml import DynamicDML +from econml.dynamic.dml._dml import _get_groups_period_filter +from econml.inference import BootstrapInference, EmpiricalInferenceResults, NormalInferenceResults +from econml.utilities import shape, hstack, vstack, reshape, cross_product +import econml.tests.utilities # bugfix for assertWarns +from econml.tests.dgp import DynamicPanelDGP + + +@pytest.mark.dml +class TestDynamicDML(unittest.TestCase): + + def test_cate_api(self): + """Test that we correctly implement the CATE API.""" + n_panels = 100 # number of panels + n_periods = 3 # number of time periods per panel + n = n_panels * n_periods + groups = np.repeat(a=np.arange(n_panels), repeats=n_periods, axis=0) + + def make_random(n, is_discrete, d): + if d is None: + return None + sz = (n, d) if d >= 0 else (n,) + if is_discrete: + return np.random.choice(['a', 'b', 'c'], size=sz) + else: + return np.random.normal(size=sz) + + for d_t in [2, 1, -1]: + for is_discrete in [True, False] if d_t <= 1 else [False]: + # for is_discrete in [False]: + for d_y in [3, 1, -1]: + for d_x in [2, None]: + for d_w in [2, None]: + W, X, Y, T = [make_random(n, is_discrete, d) + for is_discrete, d in [(False, d_w), + (False, d_x), + (False, d_y), + (is_discrete, d_t)]] + T_test = np.hstack([(T.reshape(-1, 1) if d_t == -1 else T) for i in range(n_periods)]) + for featurizer, fit_cate_intercept in\ + [(None, True), + (PolynomialFeatures(degree=2, include_bias=False), True), + (PolynomialFeatures(degree=2, include_bias=True), False)]: + + d_t_final = (2 if is_discrete else max(d_t, 1)) * n_periods + + effect_shape = (n,) + ((d_y,) if d_y > 0 else ()) + effect_summaryframe_shape = (n * (d_y if d_y > 0 else 1), 6) + marginal_effect_shape = ((n,) + + ((d_y,) if d_y > 0 else ()) + + ((d_t_final,) if d_t_final > 0 else ())) + marginal_effect_summaryframe_shape = (n * (d_y if d_y > 0 else 1) * + (d_t_final if d_t_final > 0 else 1), 6) + + # since T isn't passed to const_marginal_effect, defaults to one row if X is None + const_marginal_effect_shape = ((n if d_x else 1,) + + ((d_y,) if d_y > 0 else ()) + + ((d_t_final,) if d_t_final > 0 else())) + const_marginal_effect_summaryframe_shape = ( + (n if d_x else 1) * (d_y if d_y > 0 else 1) * + (d_t_final if d_t_final > 0 else 1), 6) + + fd_x = featurizer.fit_transform(X).shape[1:] if featurizer and d_x\ + else ((d_x,) if d_x else (0,)) + coef_shape = Y.shape[1:] + (d_t_final, ) + fd_x + + coef_summaryframe_shape = ( + (d_y if d_y > 0 else 1) * (fd_x[0] if fd_x[0] > + 0 else 1) * (d_t_final), 6) + intercept_shape = Y.shape[1:] + (d_t_final, ) + intercept_summaryframe_shape = ( + (d_y if d_y > 0 else 1) * (d_t_final if d_t_final > 0 else 1), 6) + + all_infs = [None, 'auto', BootstrapInference(2)] + est = DynamicDML(model_y=Lasso() if d_y < 1 else MultiTaskLasso(), + model_t=LogisticRegression() if is_discrete else + (Lasso() if d_t < 1 else MultiTaskLasso()), + featurizer=featurizer, + fit_cate_intercept=fit_cate_intercept, + discrete_treatment=is_discrete) + + # ensure we can serialize the unfit estimator + pickle.dumps(est) + + for inf in all_infs: + with self.subTest(d_w=d_w, d_x=d_x, d_y=d_y, d_t=d_t, + is_discrete=is_discrete, est=est, inf=inf): + + if X is None and (not fit_cate_intercept): + with pytest.raises(AttributeError): + est.fit(Y, T, X=X, W=W, groups=groups, inference=inf) + continue + + est.fit(Y, T, X=X, W=W, groups=groups, inference=inf) + + # ensure we can pickle the fit estimator + pickle.dumps(est) + + # make sure we can call the marginal_effect and effect methods + const_marg_eff = est.const_marginal_effect(X) + marg_eff = est.marginal_effect(T_test, X) + self.assertEqual(shape(marg_eff), marginal_effect_shape) + self.assertEqual(shape(const_marg_eff), const_marginal_effect_shape) + + np.testing.assert_allclose( + marg_eff if d_x else marg_eff[0:1], const_marg_eff) + + assert len(est.score_) == n_periods + for score in est.nuisance_scores_y[0]: + assert score.shape == (n_periods, ) + for score in est.nuisance_scores_t[0]: + assert score.shape == (n_periods, n_periods) + + T0 = np.full_like(T_test, 'a') if is_discrete else np.zeros_like(T_test) + eff = est.effect(X, T0=T0, T1=T_test) + self.assertEqual(shape(eff), effect_shape) + + self.assertEqual(shape(est.coef_), coef_shape) + if fit_cate_intercept: + self.assertEqual(shape(est.intercept_), intercept_shape) + else: + with pytest.raises(AttributeError): + self.assertEqual(shape(est.intercept_), intercept_shape) + + if inf is not None: + const_marg_eff_int = est.const_marginal_effect_interval(X) + marg_eff_int = est.marginal_effect_interval(T_test, X) + self.assertEqual(shape(marg_eff_int), + (2,) + marginal_effect_shape) + self.assertEqual(shape(const_marg_eff_int), + (2,) + const_marginal_effect_shape) + self.assertEqual(shape(est.effect_interval(X, T0=T0, T1=T_test)), + (2,) + effect_shape) + self.assertEqual(shape(est.coef__interval()), + (2,) + coef_shape) + if fit_cate_intercept: + self.assertEqual(shape(est.intercept__interval()), + (2,) + intercept_shape) + else: + with pytest.raises(AttributeError): + self.assertEqual(shape(est.intercept__interval()), + (2,) + intercept_shape) + + const_marg_effect_inf = est.const_marginal_effect_inference(X) + T1 = np.full_like(T_test, 'b') if is_discrete else T_test + effect_inf = est.effect_inference(X, T0=T0, T1=T1) + marg_effect_inf = est.marginal_effect_inference(T_test, X) + # test const marginal inference + self.assertEqual(shape(const_marg_effect_inf.summary_frame()), + const_marginal_effect_summaryframe_shape) + self.assertEqual(shape(const_marg_effect_inf.point_estimate), + const_marginal_effect_shape) + self.assertEqual(shape(const_marg_effect_inf.stderr), + const_marginal_effect_shape) + self.assertEqual(shape(const_marg_effect_inf.var), + const_marginal_effect_shape) + self.assertEqual(shape(const_marg_effect_inf.pvalue()), + const_marginal_effect_shape) + self.assertEqual(shape(const_marg_effect_inf.zstat()), + const_marginal_effect_shape) + self.assertEqual(shape(const_marg_effect_inf.conf_int()), + (2,) + const_marginal_effect_shape) + np.testing.assert_array_almost_equal( + const_marg_effect_inf.conf_int()[0], + const_marg_eff_int[0], decimal=5) + const_marg_effect_inf.population_summary()._repr_html_() + + # test effect inference + self.assertEqual(shape(effect_inf.summary_frame()), + effect_summaryframe_shape) + self.assertEqual(shape(effect_inf.point_estimate), + effect_shape) + self.assertEqual(shape(effect_inf.stderr), + effect_shape) + self.assertEqual(shape(effect_inf.var), + effect_shape) + self.assertEqual(shape(effect_inf.pvalue()), + effect_shape) + self.assertEqual(shape(effect_inf.zstat()), + effect_shape) + self.assertEqual(shape(effect_inf.conf_int()), + (2,) + effect_shape) + np.testing.assert_array_almost_equal( + effect_inf.conf_int()[0], + est.effect_interval(X, T0=T0, T1=T1)[0], decimal=5) + effect_inf.population_summary()._repr_html_() + + # test marginal effect inference + self.assertEqual(shape(marg_effect_inf.summary_frame()), + marginal_effect_summaryframe_shape) + self.assertEqual(shape(marg_effect_inf.point_estimate), + marginal_effect_shape) + self.assertEqual(shape(marg_effect_inf.stderr), + marginal_effect_shape) + self.assertEqual(shape(marg_effect_inf.var), + marginal_effect_shape) + self.assertEqual(shape(marg_effect_inf.pvalue()), + marginal_effect_shape) + self.assertEqual(shape(marg_effect_inf.zstat()), + marginal_effect_shape) + self.assertEqual(shape(marg_effect_inf.conf_int()), + (2,) + marginal_effect_shape) + np.testing.assert_array_almost_equal( + marg_effect_inf.conf_int()[0], marg_eff_int[0], decimal=5) + marg_effect_inf.population_summary()._repr_html_() + + # test coef__inference and intercept__inference + if X is not None: + self.assertEqual( + shape(est.coef__inference().summary_frame()), + coef_summaryframe_shape) + np.testing.assert_array_almost_equal( + est.coef__inference().conf_int() + [0], est.coef__interval()[0], decimal=5) + + if fit_cate_intercept: + cm = ExitStack() + # ExitStack can be used as a "do nothing" ContextManager + else: + cm = pytest.raises(AttributeError) + with cm: + self.assertEqual(shape(est.intercept__inference(). + summary_frame()), + intercept_summaryframe_shape) + np.testing.assert_array_almost_equal( + est.intercept__inference().conf_int() + [0], est.intercept__interval()[0], decimal=5) + + est.summary() + est.score(Y, T, X, W, groups=groups) + # make sure we can call effect with implied scalar treatments, + # no matter the dimensions of T, and also that we warn when there + # are multiple treatments + if d_t > 1: + cm = self.assertWarns(Warning) + else: + # ExitStack can be used as a "do nothing" ContextManager + cm = ExitStack() + with cm: + effect_shape2 = (n if d_x else 1,) + ((d_y,) if d_y > 0 else()) + eff = est.effect(X) if not is_discrete else est.effect( + X, T0='a', T1='b') + self.assertEqual(shape(eff), effect_shape2) + + def test_perf(self): + np.random.seed(123) + n_units = 1000 + n_periods = 3 + n_treatments = 1 + n_x = 100 + s_x = 10 + s_t = 10 + hetero_strength = .5 + hetero_inds = np.arange(n_x - n_treatments, n_x) + + def lasso_model(): + return LassoCV(cv=3) + + # No heterogeneity + dgp = DynamicPanelDGP(n_periods, n_treatments, n_x).create_instance( + s_x, random_seed=12345) + Y, T, X, W, groups = dgp.observational_data(n_units, s_t=s_t, random_seed=12345) + est = DynamicDML(model_y=lasso_model(), model_t=lasso_model(), cv=3) + # Define indices to test + groups_filter = _get_groups_period_filter(groups, 3) + shuffled_idx = np.array([groups_filter[i] for i in range(n_periods)]).flatten() + test_indices = [np.arange(n_units * n_periods), shuffled_idx] + for test_idx in test_indices: + est.fit(Y[test_idx], T[test_idx], X=X[test_idx] if X is not None else None, W=W[test_idx], + groups=groups[test_idx], inference="auto") + np.testing.assert_allclose(est.intercept_, dgp.true_effect.flatten(), atol=0.2) + np.testing.assert_array_less(est.intercept__interval()[0], dgp.true_effect.flatten()) + np.testing.assert_array_less(dgp.true_effect.flatten(), est.intercept__interval()[1]) + + # Heterogeneous effects + dgp = DynamicPanelDGP(n_periods, n_treatments, n_x).create_instance( + s_x, hetero_strength=hetero_strength, hetero_inds=hetero_inds, random_seed=12) + Y, T, X, W, groups = dgp.observational_data(n_units, s_t=s_t, random_seed=1) + hetero_strength = .5 + hetero_inds = np.arange(n_x - n_treatments, n_x) + for test_idx in test_indices: + est.fit(Y[test_idx], T[test_idx], X=X[test_idx], W=W[test_idx], groups=groups[test_idx], inference="auto") + np.testing.assert_allclose(est.intercept_, dgp.true_effect.flatten(), atol=0.2) + np.testing.assert_allclose(est.coef_, dgp.true_hetero_effect[:, hetero_inds + 1], atol=0.2) + np.testing.assert_array_less(est.intercept__interval()[0], dgp.true_effect.flatten()) + np.testing.assert_array_less(dgp.true_effect.flatten(), est.intercept__interval()[1]) + np.testing.assert_array_less(est.coef__interval()[0] - .05, dgp.true_hetero_effect[:, hetero_inds + 1]) + np.testing.assert_array_less(dgp.true_hetero_effect[:, hetero_inds + 1] - .05, est.coef__interval()[1]) diff --git a/notebooks/CustomerScenarios/Case Study - Long-Term Return-on-Investment at Microsoft via Short-Term Proxies.ipynb b/notebooks/CustomerScenarios/Case Study - Long-Term Return-on-Investment at Microsoft via Short-Term Proxies.ipynb new file mode 100644 index 000000000..4c6ca7658 --- /dev/null +++ b/notebooks/CustomerScenarios/Case Study - Long-Term Return-on-Investment at Microsoft via Short-Term Proxies.ipynb @@ -0,0 +1,1090 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "# Long-Term Return-on-Investment at Microsoft via Short-Term Proxies\n", + "\n", + "\n", + "Policy makers typically face the problem of wanting to estimate the treatment effect of some new incentives on long-run downstream interests. However, we only have historical data of older treatment options, and we haven't seen the long-run play out yet. We assume access to a long-term dataset where only past treatments were administered and a short-term dataset where novel treatments have been administered. We propose a surrogate based approach where we assume that the long-term effect is channeled through a multitude of available short-term proxies. Our work combines three major recent techniques in the causal machine learning literature: **surrogate indices**, **dynamic treatment effect estimation** and **double machine learning**, in a unified\n", + "pipeline. For more details, see this paper [here](https://arxiv.org/pdf/2103.08390.pdf).\n", + "\n", + "In this case study, we will show you how to apply this unified pipeline to a ROI estimation problem at Microsoft. These methodologies have already been implemented into our [EconML](https://aka.ms/econml) library and you could do it with only a few lines of code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Summary\n", + "\n", + "1. [Background](#Background)\n", + "2. [Data](#Data)\n", + "3. [Do Dynamic Adjustment with EconML](#Do-Dynamic-Adjustment-with-EconML)\n", + "4. [Train Surrogate Index](#Train-Surrogate-Index)\n", + "5. [Run DML to Learn ROI with EconML](#Run-DML-to-Learn-ROI-with-EconML)\n", + "6. [Model Evaluation](#Model-Evaluation)\n", + "7. [Extensions -- Including Heterogeneity in Effect](#Extensions----Including-Heterogeneity-in-Effect)\n", + "8. [Conclusions](#Conclusions)" + ] + }, + { + "attachments": { + "causal_graph.PNG": { + "image/png": "" + }, + "pipeline.PNG": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Background\n", + "\n", + "Microsoft provides multiple montetary and resource investments to enterprice customers in support of products adoption, the sales manager would like to know which of these programs (\"investments\") are more successful than others? Specifically, we are interested in identifying the average treatment effect of each investment at some period $t$, on the cumulative outcome in the subsequent $m$ months. \n", + "\n", + "There are a few challenges to answer this question. First of all, we haven't fully observed the long-term revenue yet and we don't want to wait that long to evaluate a program. In addition, a careful causal modeling is required to correctly attribute the long-term ROI of multiple programs in a holistic manner, avoiding the biased estimate coming from confounding effect or double counting issues. \n", + "\n", + "The causal graph below shows how to frame this problem:\n", + "\n", + "![causal_graph.PNG](attachment:causal_graph.PNG)\n", + "\n", + "**Methodology:** Our proposed adjusted surrogate index approach could address all the chanllenges above by assuming the long-term effect is channeled through some short-term observed surrogates and employing a dynamic adjustment step (`DynamicDML`) to the surrogate model in order to get rid of the effect from future investment, finally applying double machine learning (`DML`) techniques to estimate the ROI. \n", + "\n", + "The pipeline below tells you how to solve this problem step by step:\n", + "![pipeline.PNG](attachment:pipeline.PNG)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 2;\n", + " var nbb_formatted_code = \"# imports\\nfrom econml.data.dynamic_panel_dgp import SemiSynthetic\\nfrom sklearn.linear_model import LassoCV, MultiTaskLassoCV\\nimport numpy as np\\nimport matplotlib.pyplot as plt\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# imports\n", + "from econml.data.dynamic_panel_dgp import SemiSynthetic\n", + "from sklearn.linear_model import LassoCV, MultiTaskLassoCV\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data\n", + "\n", + "The **semi-synthetic data*** is comprised of 4 components:\n", + " * **Surrogates:** short-term metrics that could represent long-term revenue\n", + " * **Treatments:** different types of monetary investments to the end customers\n", + " * **Outcomes:** cumulative long-term revenue\n", + " * **Controls:** lagged surrogates and treatments, other time-invariant controls (e.g. demographics)\n", + "\n", + "To build the semi-synthetic data we estimate a series of moments from a real-world dataset: a full covariance matrix of\n", + "all surrogates, treatments, and controls in one period and a series of linear prediction models (lassoCV) of each surrogate and\n", + "treatment on a set of 6 lags of each treatment, 6 lags of each surrogate, and time-invariant controls. Using these values, we draw new parameters from distributions matching the key characteristics of each family of parameters. Finally, we use these new\n", + "parameters to simulate surrogates, treatments, and controls by drawing a set of initial values from the covariance matrix and\n", + "forward simulating to match intertemporal relationships from the transformed prediction models. We use one surrogate to be the outcome of interests. Then we consider the effect of each treatment in period $t$ on the cumulative sum of outcome from following 4 periods. We can calculate the true treatment effects in the semi-synthetic data as a function of parameters from the linear prediction models.\n", + "\n", + "The input data is in a **panel format**. Each panel corresponds to one company and the different rows in a panel correspond to different time period. \n", + "\n", + "Example:\n", + "\n", + "||Company|Year|Features|Controls/Surrogates|T1|T2|T3|AdjRev|\n", + "|---|---|---|---|---|---|---|---|---|\n", + "|1|A|2018|...|...|\\$1,000|...|...|\\$10,000|\n", + "|2|A|2019|...|...|\\$2,000|...|...|\\$12,000|\n", + "|3|A|2020|...|...|\\$3,000|...|...|\\$15,000|\n", + "|4|A|2021|...|...|\\$3,000|...|...|\\$18,000|\n", + "|5|B|2018|...|...|\\$0|...|...|\\$5,000|\n", + "|6|B|2019|...|...|\\$1,000|...|...|\\$10,000|\n", + "|7|B|2020|...|...|\\$0|...|...|\\$7,000|\n", + "|8|B|2021|...|...|\\$1,200|...|...|\\$12,000|\n", + "|9|C|2018|...|...|\\$1,000|...|...|\\$20,000|\n", + "|10|C|2019|...|...|\\$1,500|...|...|\\$25,000|\n", + "|11|C|2020|...|...|\\$500|...|...|\\$18,000|\n", + "|12|C|2021|...|...|\\$500|...|...|\\$20,000|\n", + " \n", + " **For confidentiality reason, the data used in this case study is synthetically generated and the feature distributions don't exactly correspond to real distributions.*" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 3;\n", + " var nbb_formatted_code = \"# generate historical dataset (training purpose)\\nnp.random.seed(43)\\ndgp = SemiSynthetic()\\ndgp.create_instance()\\nn_periods = 4\\nn_units = 5000\\nn_treatments = dgp.n_treatments\\nrandom_seed = 43\\nthetas = np.random.uniform(0, 2, size=(dgp.n_proxies, n_treatments))\\n\\npanelX, panelT, panelY, panelGroups, true_effect = dgp.gen_data(\\n n_units, n_periods, thetas, random_seed\\n)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# generate historical dataset (training purpose)\n", + "np.random.seed(43)\n", + "dgp = SemiSynthetic()\n", + "dgp.create_instance()\n", + "n_periods = 4\n", + "n_units = 5000\n", + "n_treatments = dgp.n_treatments\n", + "random_seed = 43\n", + "thetas = np.random.uniform(0, 2, size=(dgp.n_proxies, n_treatments))\n", + "\n", + "panelX, panelT, panelY, panelGroups, true_effect = dgp.gen_data(\n", + " n_units, n_periods, thetas, random_seed\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Outcome shape: (5000, 4)\n", + "Treatment shape: (5000, 4, 3)\n", + "Controls shape: (5000, 4, 71)\n" + ] + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 4;\n", + " var nbb_formatted_code = \"# print panel data shape\\nprint(\\\"Outcome shape: \\\", panelY.shape)\\nprint(\\\"Treatment shape: \\\", panelT.shape)\\nprint(\\\"Controls shape: \\\", panelX.shape)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# print panel data shape\n", + "print(\"Outcome shape: \", panelY.shape)\n", + "print(\"Treatment shape: \", panelT.shape)\n", + "print(\"Controls shape: \", panelX.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 5;\n", + " var nbb_formatted_code = \"# generate new dataset (testing purpose)\\nthetas_new = np.random.uniform(0, 2, size=(dgp.n_proxies, n_treatments))\\npanelXnew, panelTnew, panelYnew, panelGroupsnew, true_effect_new = dgp.gen_data(\\n n_units, n_periods, thetas_new, random_seed\\n)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# generate new dataset (testing purpose)\n", + "thetas_new = np.random.uniform(0, 2, size=(dgp.n_proxies, n_treatments))\n", + "panelXnew, panelTnew, panelYnew, panelGroupsnew, true_effect_new = dgp.gen_data(\n", + " n_units, n_periods, thetas_new, random_seed\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True Long-term Effect for each investment: [0.90994672 0.709811 2.45310877]\n" + ] + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 6;\n", + " var nbb_formatted_code = \"# print true long term effect\\ntrue_longterm_effect = np.sum(true_effect_new, axis=0)\\nprint(\\\"True Long-term Effect for each investment: \\\", true_longterm_effect)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# print true long term effect\n", + "true_longterm_effect = np.sum(true_effect_new, axis=0)\n", + "print(\"True Long-term Effect for each investment: \", true_longterm_effect)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Do Dynamic Adjustment with EconML\n", + "From the causal graph above, we could see we want to first remove the effects of future incentives from the historical outcomes to create an **adjusted long-term revenue** as if those future incentives never happened.\n", + "\n", + "EconML's `DynamicDML` estimator is an extension of Double Machine Learning approach to **dynamically estimate the period effect of treatments assigned sequentially over time period**. In this scenario, it could help us to adjust the cumulative revenue by subtracting the period effect of all of the investments after the target investment.\n", + "\n", + "For more details about `DynamicDML`, please read this [paper](https://arxiv.org/pdf/2002.07285.pdf). " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 7;\n", + " var nbb_formatted_code = \"# Helper function to reshape the panel data\\ndef long(x): # reshape the panel data to (n_units * n_periods, -1)\\n n_units = x.shape[0]\\n n_periods = x.shape[1]\\n return (\\n x.reshape(n_units * n_periods)\\n if np.ndim(x) == 2\\n else x.reshape(n_units * n_periods, -1)\\n )\\n\\n\\ndef wide(x): # reshape the panel data to (n_units, n_periods * d_x)\\n n_units = x.shape[0]\\n return x.reshape(n_units, -1)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Helper function to reshape the panel data\n", + "def long(x): # reshape the panel data to (n_units * n_periods, -1)\n", + " n_units = x.shape[0]\n", + " n_periods = x.shape[1]\n", + " return (\n", + " x.reshape(n_units * n_periods)\n", + " if np.ndim(x) == 2\n", + " else x.reshape(n_units * n_periods, -1)\n", + " )\n", + "\n", + "\n", + "def wide(x): # reshape the panel data to (n_units, n_periods * d_x)\n", + " n_units = x.shape[0]\n", + " return x.reshape(n_units, -1)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 8;\n", + " var nbb_formatted_code = \"# on historical data construct adjusted outcomes\\nfrom econml.dynamic.dml import DynamicDML\\n\\npanelYadj = panelY.copy()\\n\\nest = DynamicDML(\\n model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=2\\n)\\nfor t in range(1, n_periods): # for each target period 1...m\\n # learn period effect for each period treatment on target period t\\n est.fit(\\n long(panelY[:, 1 : t + 1]),\\n long(panelT[:, 1 : t + 1, :]), # reshape data to long format\\n X=None,\\n W=long(panelX[:, 1 : t + 1, :]),\\n groups=long(panelGroups[:, 1 : t + 1]),\\n )\\n # remove effect of observed treatments\\n T1 = wide(panelT[:, 1 : t + 1, :])\\n panelYadj[:, t] = panelY[:, t] - est.effect(\\n T0=np.zeros_like(T1), T1=T1\\n ) # reshape data to wide format\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# on historical data construct adjusted outcomes\n", + "from econml.dynamic.dml import DynamicDML\n", + "\n", + "panelYadj = panelY.copy()\n", + "\n", + "est = DynamicDML(\n", + " model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=2\n", + ")\n", + "for t in range(1, n_periods): # for each target period 1...m\n", + " # learn period effect for each period treatment on target period t\n", + " est.fit(\n", + " long(panelY[:, 1 : t + 1]),\n", + " long(panelT[:, 1 : t + 1, :]), # reshape data to long format\n", + " X=None,\n", + " W=long(panelX[:, 1 : t + 1, :]),\n", + " groups=long(panelGroups[:, 1 : t + 1]),\n", + " )\n", + " # remove effect of observed treatments\n", + " T1 = wide(panelT[:, 1 : t + 1, :])\n", + " panelYadj[:, t] = panelY[:, t] - est.effect(\n", + " T0=np.zeros_like(T1), T1=T1\n", + " ) # reshape data to wide format" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train Surrogate Index\n", + "Once we have the adjusted outcome, we'd like to train any ML model to learn the relationship between short-term surrogates and long-term revenue from the historical dataset, assuming the treatment effect of investments on long-term revenue could **only** go through short-term surrogates, and the **relationship keeps the same** between the historical dataset and the new dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 9;\n", + " var nbb_formatted_code = \"# train surrogate index on historical dataset\\nXS = np.hstack(\\n [panelX[:, 1], panelYadj[:, :1]]\\n) # concatenate controls and surrogates from historical dataset\\nTotalYadj = np.sum(panelYadj, axis=1) # total revenue from historical dataset\\nadjusted_proxy_model = LassoCV().fit(\\n XS, TotalYadj\\n) # train proxy model from historical dataset\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# train surrogate index on historical dataset\n", + "XS = np.hstack(\n", + " [panelX[:, 1], panelYadj[:, :1]]\n", + ") # concatenate controls and surrogates from historical dataset\n", + "TotalYadj = np.sum(panelYadj, axis=1) # total revenue from historical dataset\n", + "adjusted_proxy_model = LassoCV().fit(\n", + " XS, TotalYadj\n", + ") # train proxy model from historical dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 10;\n", + " var nbb_formatted_code = \"# predict new long term revenue\\nXSnew = np.hstack(\\n [panelXnew[:, 1], panelYnew[:, :1]]\\n) # concatenate controls and surrogates from new dataset\\nsindex_adj = adjusted_proxy_model.predict(XSnew)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# predict new long term revenue\n", + "XSnew = np.hstack(\n", + " [panelXnew[:, 1], panelYnew[:, :1]]\n", + ") # concatenate controls and surrogates from new dataset\n", + "sindex_adj = adjusted_proxy_model.predict(XSnew)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run DML to Learn ROI with EconML\n", + "Finally we will call `LinearDML` estimator from EconML to learn the treatment effect of multiple investments on the adjusted surrogate index in new dataset. `LinearDML` is a two stage machine learning models for estimating **(heterogeneous) treatment effects** when all potential confounders are observed, it leverages the machine learning power to deal with **high dimensional dataset** and still be able to construct **confidence intervals**. \n", + "\n", + "For more details, please read this [paper](https://arxiv.org/pdf/1608.00060.pdf). " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True Long-term Effect for each investment: [0.90994672 0.709811 2.45310877]\n", + "Coefficient Results: X is None, please call intercept_inference to learn the constant!\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept|T0 0.83 0.015 57.214 0.0 0.802 0.858
cate_intercept|T1 0.677 0.028 23.767 0.0 0.621 0.733
cate_intercept|T2 2.438 0.035 69.711 0.0 2.369 2.507


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " CATE Intercept Results \n", + "=======================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "-----------------------------------------------------------------------\n", + "cate_intercept|T0 0.83 0.015 57.214 0.0 0.802 0.858\n", + "cate_intercept|T1 0.677 0.028 23.767 0.0 0.621 0.733\n", + "cate_intercept|T2 2.438 0.035 69.711 0.0 2.369 2.507\n", + "-----------------------------------------------------------------------\n", + "\n", + "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", + "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", + "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", + "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", + "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", + "\"\"\"" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 11;\n", + " var nbb_formatted_code = \"# learn treatment effect on surrogate index on new dataset\\nfrom econml.dml import LinearDML\\n\\nadjsurr_est = LinearDML(\\n model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\\n)\\n# fit treatment_0 on total revenue from new dataset\\nadjsurr_est.fit(sindex_adj, panelTnew[:, 0], X=None, W=panelXnew[:, 0])\\n# print treatment effect summary\\nprint(\\\"True Long-term Effect for each investment: \\\", true_longterm_effect)\\nadjsurr_est.summary(alpha=0.05)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# learn treatment effect on surrogate index on new dataset\n", + "from econml.dml import LinearDML\n", + "\n", + "adjsurr_est = LinearDML(\n", + " model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\n", + ")\n", + "# fit treatment_0 on total revenue from new dataset\n", + "adjsurr_est.fit(sindex_adj, panelTnew[:, 0], X=None, W=panelXnew[:, 0])\n", + "# print treatment effect summary\n", + "print(\"True Long-term Effect for each investment: \", true_longterm_effect)\n", + "adjsurr_est.summary(alpha=0.05)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 12;\n", + " var nbb_formatted_code = \"# save the treatment effect and confidence interval\\nadjsurr_point_est = adjsurr_est.intercept_\\nadjsurr_conf_int_lb, adjsurr_conf_int_ub = adjsurr_est.intercept__interval(alpha=0.05)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# save the treatment effect and confidence interval\n", + "adjsurr_point_est = adjsurr_est.intercept_\n", + "adjsurr_conf_int_lb, adjsurr_conf_int_ub = adjsurr_est.intercept__interval(alpha=0.05)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Model Evaluation\n", + "Now we want to compare the proposed **adjusted surrogate index** approach with estimation from realized long-term outcome. Below we train another `LinearDML` model on the realized cumulative revenue directly, without any adjustment. And then we visualize the two models output, comparing with the ground truth." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True Long-term Effect for each investment: [0.90994672 0.709811 2.45310877]\n", + "Coefficient Results: X is None, please call intercept_inference to learn the constant!\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept|T0 2.227 0.039 56.865 0.0 2.15 2.304
cate_intercept|T1 1.561 0.226 6.911 0.0 1.118 2.004
cate_intercept|T2 4.335 0.209 20.748 0.0 3.926 4.745


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " CATE Intercept Results \n", + "=======================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "-----------------------------------------------------------------------\n", + "cate_intercept|T0 2.227 0.039 56.865 0.0 2.15 2.304\n", + "cate_intercept|T1 1.561 0.226 6.911 0.0 1.118 2.004\n", + "cate_intercept|T2 4.335 0.209 20.748 0.0 3.926 4.745\n", + "-----------------------------------------------------------------------\n", + "\n", + "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", + "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", + "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", + "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", + "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", + "\"\"\"" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 13;\n", + " var nbb_formatted_code = \"# learn treatment effect on direct outcome\\nfrom econml.dml import LinearDML\\n\\ndirect_est = LinearDML(\\n model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\\n)\\n# fit treatment_0 on total revenue from new dataset\\ndirect_est.fit(np.sum(panelYnew, axis=1), panelTnew[:, 0], X=None, W=panelXnew[:, 0])\\n# print treatment effect summary\\nprint(\\\"True Long-term Effect for each investment: \\\", true_longterm_effect)\\ndirect_est.summary(alpha=0.05)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# learn treatment effect on direct outcome\n", + "from econml.dml import LinearDML\n", + "\n", + "direct_est = LinearDML(\n", + " model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\n", + ")\n", + "# fit treatment_0 on total revenue from new dataset\n", + "direct_est.fit(np.sum(panelYnew, axis=1), panelTnew[:, 0], X=None, W=panelXnew[:, 0])\n", + "# print treatment effect summary\n", + "print(\"True Long-term Effect for each investment: \", true_longterm_effect)\n", + "direct_est.summary(alpha=0.05)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 14;\n", + " var nbb_formatted_code = \"# save the treatment effect and confidence interval\\ndirect_point_est = direct_est.intercept_\\ndirect_conf_int_lb, direct_conf_int_ub = direct_est.intercept__interval(alpha=0.05)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# save the treatment effect and confidence interval\n", + "direct_point_est = direct_est.intercept_\n", + "direct_conf_int_lb, direct_conf_int_ub = direct_est.intercept__interval(alpha=0.05)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0.5, 0.98, 'Error bar plot of treatment effect from different models')" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 15;\n", + " var nbb_formatted_code = \"# plot the error bar plot of different models\\nplt.figure(figsize=(18, 6))\\nplt.subplot(1, 2, 1)\\n\\nplt.errorbar(\\n np.arange(n_treatments) - 0.04,\\n true_longterm_effect,\\n fmt=\\\"o\\\",\\n alpha=0.6,\\n label=\\\"Ground truth\\\",\\n)\\nplt.errorbar(\\n np.arange(n_treatments),\\n adjsurr_point_est,\\n yerr=(\\n adjsurr_conf_int_ub - adjsurr_point_est,\\n adjsurr_point_est - adjsurr_conf_int_lb,\\n ),\\n fmt=\\\"o\\\",\\n label=\\\"Adjusted Surrogate Index\\\",\\n)\\nplt.xticks(np.arange(n_treatments), [\\\"T0\\\", \\\"T1\\\", \\\"T2\\\"])\\nplt.ylabel(\\\"Effect\\\")\\nplt.legend()\\n\\nplt.subplot(1, 2, 2)\\nplt.errorbar(\\n np.arange(n_treatments) - 0.04,\\n true_longterm_effect,\\n fmt=\\\"o\\\",\\n alpha=0.6,\\n label=\\\"Ground truth\\\",\\n)\\nplt.errorbar(\\n np.arange(n_treatments),\\n adjsurr_point_est,\\n yerr=(\\n adjsurr_conf_int_ub - adjsurr_point_est,\\n adjsurr_point_est - adjsurr_conf_int_lb,\\n ),\\n fmt=\\\"o\\\",\\n label=\\\"Adjusted Surrogate Index\\\",\\n)\\nplt.errorbar(\\n np.arange(n_treatments) + 0.04,\\n direct_point_est,\\n yerr=(direct_conf_int_ub - direct_point_est, direct_point_est - direct_conf_int_lb),\\n fmt=\\\"o\\\",\\n label=\\\"Direct Model\\\",\\n)\\nplt.xticks(np.arange(n_treatments), [\\\"T0\\\", \\\"T1\\\", \\\"T2\\\"])\\nplt.ylabel(\\\"Effect\\\")\\nplt.legend()\\nplt.suptitle(\\\"Error bar plot of treatment effect from different models\\\")\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# plot the error bar plot of different models\n", + "plt.figure(figsize=(18, 6))\n", + "plt.subplot(1, 2, 1)\n", + "\n", + "plt.errorbar(\n", + " np.arange(n_treatments) - 0.04,\n", + " true_longterm_effect,\n", + " fmt=\"o\",\n", + " alpha=0.6,\n", + " label=\"Ground truth\",\n", + ")\n", + "plt.errorbar(\n", + " np.arange(n_treatments),\n", + " adjsurr_point_est,\n", + " yerr=(\n", + " adjsurr_conf_int_ub - adjsurr_point_est,\n", + " adjsurr_point_est - adjsurr_conf_int_lb,\n", + " ),\n", + " fmt=\"o\",\n", + " label=\"Adjusted Surrogate Index\",\n", + ")\n", + "plt.xticks(np.arange(n_treatments), [\"T0\", \"T1\", \"T2\"])\n", + "plt.ylabel(\"Effect\")\n", + "plt.legend()\n", + "\n", + "plt.subplot(1, 2, 2)\n", + "plt.errorbar(\n", + " np.arange(n_treatments) - 0.04,\n", + " true_longterm_effect,\n", + " fmt=\"o\",\n", + " alpha=0.6,\n", + " label=\"Ground truth\",\n", + ")\n", + "plt.errorbar(\n", + " np.arange(n_treatments),\n", + " adjsurr_point_est,\n", + " yerr=(\n", + " adjsurr_conf_int_ub - adjsurr_point_est,\n", + " adjsurr_point_est - adjsurr_conf_int_lb,\n", + " ),\n", + " fmt=\"o\",\n", + " label=\"Adjusted Surrogate Index\",\n", + ")\n", + "plt.errorbar(\n", + " np.arange(n_treatments) + 0.04,\n", + " direct_point_est,\n", + " yerr=(direct_conf_int_ub - direct_point_est, direct_point_est - direct_conf_int_lb),\n", + " fmt=\"o\",\n", + " label=\"Direct Model\",\n", + ")\n", + "plt.xticks(np.arange(n_treatments), [\"T0\", \"T1\", \"T2\"])\n", + "plt.ylabel(\"Effect\")\n", + "plt.legend()\n", + "plt.suptitle(\"Error bar plot of treatment effect from different models\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We could see the **adjusted surrogate index** approach does a good job overcomes a common data limitation when considering long-term effects of novel treatments and expands the surrogate approach to consider a common, and previously\n", + "problematic, pattern of serially correlated treatments." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Extensions -- Including Heterogeneity in Effect\n", + "\n", + "Finally, I will show that our EconML's `DynamicDML` and `LinearDML` estimators could not only learn Average Treatment Effect (ATE), but also **Heterogeneous Treatment Effect (CATE)**, which will return the treatment effect as a function of interested characteristics. In the example below, I will use first control variable as feature to learn effect heterogeneity, and retrain the final `LinearDML` model. Similarly, you could train `DynamicDML` with feature $X$ as well." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True Long-term Effect for each investment: [0.90994672 0.709811 2.45310877]\n", + "Average treatment effect for each investment: [0.82738185 0.71610965 2.56087599]\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
Coefficient Results
point_estimate stderr zstat pvalue ci_lower ci_upper
X0|T0 0.009 0.011 0.76 0.447 -0.014 0.031
X0|T1 0.037 0.031 1.218 0.223 -0.023 0.098
X0|T2 -0.072 0.151 -0.478 0.633 -0.369 0.224
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept|T0 0.827 0.015 56.625 0.0 0.799 0.856
cate_intercept|T1 0.716 0.032 22.466 0.0 0.654 0.779
cate_intercept|T2 2.56 0.237 10.82 0.0 2.096 3.024


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " Coefficient Results \n", + "===========================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "-----------------------------------------------------------\n", + "X0|T0 0.009 0.011 0.76 0.447 -0.014 0.031\n", + "X0|T1 0.037 0.031 1.218 0.223 -0.023 0.098\n", + "X0|T2 -0.072 0.151 -0.478 0.633 -0.369 0.224\n", + " CATE Intercept Results \n", + "=======================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "-----------------------------------------------------------------------\n", + "cate_intercept|T0 0.827 0.015 56.625 0.0 0.799 0.856\n", + "cate_intercept|T1 0.716 0.032 22.466 0.0 0.654 0.779\n", + "cate_intercept|T2 2.56 0.237 10.82 0.0 2.096 3.024\n", + "-----------------------------------------------------------------------\n", + "\n", + "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", + "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", + "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", + "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", + "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", + "\"\"\"" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "application/javascript": [ + "\n", + " setTimeout(function() {\n", + " var nbb_cell_id = 16;\n", + " var nbb_formatted_code = \"# learn treatment effect on surrogate index on new dataset\\nfrom econml.dml import LinearDML\\n\\nadjsurr_est = LinearDML(\\n model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\\n)\\n# fit treatment_0 on total revenue from new dataset\\nadjsurr_est.fit(\\n sindex_adj, panelTnew[:, 0], X=panelXnew[:, 0, :1], W=panelXnew[:, 0, 1:]\\n)\\n# print treatment effect summary\\nprint(\\\"True Long-term Effect for each investment: \\\", true_longterm_effect)\\nprint(\\n \\\"Average treatment effect for each investment: \\\",\\n adjsurr_est.const_marginal_ate(panelXnew[:, 0, :1]),\\n)\\nadjsurr_est.summary(alpha=0.05)\";\n", + " var nbb_cells = Jupyter.notebook.get_cells();\n", + " for (var i = 0; i < nbb_cells.length; ++i) {\n", + " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", + " nbb_cells[i].set_text(nbb_formatted_code);\n", + " break;\n", + " }\n", + " }\n", + " }, 500);\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# learn treatment effect on surrogate index on new dataset\n", + "from econml.dml import LinearDML\n", + "\n", + "adjsurr_est = LinearDML(\n", + " model_y=LassoCV(max_iter=2000), model_t=MultiTaskLassoCV(max_iter=2000), cv=3\n", + ")\n", + "# fit treatment_0 on total revenue from new dataset\n", + "adjsurr_est.fit(\n", + " sindex_adj, panelTnew[:, 0], X=panelXnew[:, 0, :1], W=panelXnew[:, 0, 1:]\n", + ")\n", + "# print treatment effect summary\n", + "print(\"True Long-term Effect for each investment: \", true_longterm_effect)\n", + "print(\n", + " \"Average treatment effect for each investment: \",\n", + " adjsurr_est.const_marginal_ate(panelXnew[:, 0, :1]),\n", + ")\n", + "adjsurr_est.summary(alpha=0.05)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the summary table above, none of the coefficient for feature $X0$ is significant, that means there is no effect heterogeneity identified, which is consistent with the data generation process." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conclusions\n", + "\n", + "In this notebook, we have demonstrated the power of using EconML to:\n", + "\n", + "* estimate treatment effects in settings when multiple treatments are assigned over time and treatments can have a causal effect on future outcomes\n", + "* correct the bias coming from auto-correlation of the historical treatment policy\n", + "* use Machine Learning to enable estimation with high-dimensional surrogates and controls\n", + "* solve a complex problem using an unified pipeline with only a few lines of code\n", + "\n", + "To learn more about what EconML can do for you, visit our [website](https://aka.ms/econml), our [GitHub page](https://github.com/microsoft/EconML) or our [documentation](https://econml.azurewebsites.net/). " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/Dynamic Double Machine Learning Examples.ipynb b/notebooks/Dynamic Double Machine Learning Examples.ipynb new file mode 100755 index 000000000..b5d48e3a2 --- /dev/null +++ b/notebooks/Dynamic Double Machine Learning Examples.ipynb @@ -0,0 +1,778 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Dynamic Double Machine Learning: Use Cases and Examples\n", + "\n", + "Dynamic DoubleML is an extension of the Double ML approach for treatments assigned sequentially over time periods. This estimator will account for treatments that can have causal effects on future outcomes. For more details, see [this paper](https://arxiv.org/abs/2002.07285) or the [EconML docummentation](https://econml.azurewebsites.net/).\n", + "\n", + "For example, the Dynamic DoubleML could be useful in estimating the following causal effects:\n", + "* the effect of investments on revenue at companies that receive investments at regular intervals ([see more](https://arxiv.org/abs/2103.08390))\n", + "* the effect of prices on demand in stores where prices of goods change over time\n", + "* the effect of income on health outcomes in people who receive yearly income\n", + "\n", + "The preferred data format is balanced panel data. Each panel corresponds to one entity (e.g. company, store or person) and the different rows in a panel correspond to different time points. Example:\n", + "\n", + "||Company|Year|Features|Investment|Revenue|\n", + "|---|---|---|---|---|---|\n", + "|1|A|2018|...|\\$1,000|\\$10,000|\n", + "|2|A|2019|...|\\$2,000|\\$12,000|\n", + "|3|A|2020|...|\\$3,000|\\$15,000|\n", + "|4|B|2018|...|\\$0|\\$5,000|\n", + "|5|B|2019|...|\\$100|\\$10,000|\n", + "|6|B|2020|...|\\$1,200|\\$7,000|\n", + "|7|C|2018|...|\\$1,000|\\$20,000|\n", + "|8|C|2019|...|\\$1,500|\\$25,000|\n", + "|9|C|2020|...|\\$500|\\$15,000|\n", + "\n", + "(Note: when passing the data to the DynamicDML estimator, the \"Company\" column above corresponds to the `groups` argument at fit time. The \"Year\" column above should not be passed in as it will be inferred from the \"Company\" column)\n", + "\n", + "If group memebers do not appear together, it is assumed that the first instance of a group in the dataset corresponds to the first period of that group, the second instance of the group corresponds to the second period, etc. Example:\n", + "\n", + "||Company|Features|Investment|Revenue|\n", + "|---|---|---|---|---|\n", + "|1|A|...|\\$1,000|\\$10,000|\n", + "|2|B|...|\\$0|\\$5,000\n", + "|3|C|...|\\$1,000|\\$20,000|\n", + "|4|A|...|\\$2,000|\\$12,000|\n", + "|5|B|...|\\$100|\\$10,000|\n", + "|6|C|...|\\$1,500|\\$25,000|\n", + "|7|A|...|\\$3,000|\\$15,000|\n", + "|8|B|...|\\$1,200|\\$7,000|\n", + "|9|C|...|\\$500|\\$15,000|\n", + "\n", + "In this dataset, 1st row corresponds to the first period of group `A`, 4th row corresponds to the second period of group `A`, etc.\n", + "\n", + "In this notebook, we show the performance of the DynamicDML on synthetic and observational data. \n", + "\n", + "## Notebook Contents\n", + "\n", + "1. [Example Usage with Average Treatment Effects](#1.-Example-Usage-with-Average-Treatment-Effects)\n", + "2. [Example Usage with Heterogeneous Treatment Effects](#2.-Example-Usage-with-Heterogeneous-Treatment-Effects)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext autoreload\n", + "%autoreload 2" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import econml" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# Main imports\n", + "from econml.dynamic.dml import DynamicDML\n", + "from econml.tests.dgp import DynamicPanelDGP, add_vlines\n", + "\n", + "# Helper imports\n", + "import numpy as np\n", + "from sklearn.linear_model import Lasso, LassoCV, LogisticRegression, LogisticRegressionCV, MultiTaskLassoCV\n", + "import matplotlib.pyplot as plt\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1. Example Usage with Average Treatment Effects" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1.1 DGP\n", + "\n", + "We consider a data generating process from a markovian treatment model. \n", + "\n", + "In the example bellow, $T_t\\rightarrow$ treatment(s) at time $t$, $Y_t\\rightarrow$outcome at time $t$, $X_t\\rightarrow$ features and controls at time $t$ (the coefficients $e, f$ will pick the features and the controls).\n", + "\\begin{align}\n", + " X_t =& (\\pi'X_{t-1} + 1) \\cdot A\\, T_{t-1} + B X_{t-1} + \\epsilon_t\\\\\n", + " T_t =& \\gamma\\, T_{t-1} + (1-\\gamma) \\cdot D X_t + \\zeta_t\\\\\n", + " Y_t =& (\\sigma' X_{t} + 1) \\cdot e\\, T_{t} + f X_t + \\eta_t\n", + "\\end{align}\n", + "\n", + "with $X_0, T_0 = 0$ and $\\epsilon_t, \\zeta_t, \\eta_t \\sim N(0, \\sigma^2)$. Moreover, $X_t \\in R^{n_x}$, $B[:, 0:s_x] \\neq 0$ and $B[:, s_x:-1] = 0$, $\\gamma\\in [0, 1]$, $D[:, 0:s_x] \\neq 0$, $D[:, s_x:-1]=0$, $f[0:s_x]\\neq 0$, $f[s_x:-1]=0$. We draw a single time series of samples of length $n\\_panels \\cdot n\\_periods$." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# Define DGP parameters\n", + "np.random.seed(123)\n", + "n_panels = 5000 # number of panels\n", + "n_periods = 3 # number of time periods in each panel\n", + "n_treatments = 2 # number of treatments in each period\n", + "n_x = 100 # number of features + controls\n", + "s_x = 10 # number of controls (endogeneous variables)\n", + "s_t = 10 # treatment support size" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate data\n", + "dgp = DynamicPanelDGP(n_periods, n_treatments, n_x).create_instance(\n", + " s_x, random_seed=12345)\n", + "Y, T, X, W, groups = dgp.observational_data(n_panels, s_t=s_t, random_seed=12345)\n", + "true_effect = dgp.true_effect" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1.2 Train Estimator" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "est = DynamicDML(\n", + " model_y=LassoCV(cv=3, max_iter=1000), \n", + " model_t=MultiTaskLassoCV(cv=3, max_iter=1000), \n", + " cv=3)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "est.fit(Y, T, X=None, W=W, groups=groups)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Average effect of default policy: 1.40\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "A scalar was specified but there are multiple treatments; the same value will be used for each treatment. Consider specifyingall treatments, or using the const_marginal_effect method.\n" + ] + } + ], + "source": [ + "# Average treatment effect of all periods on last period for unit treatments\n", + "print(f\"Average effect of default policy: {est.ate():0.2f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Effect of target policy over baseline policy: 1.40\n" + ] + } + ], + "source": [ + "# Effect of target policy over baseline policy\n", + "# Must specify a treatment for each period\n", + "baseline_policy = np.zeros((1, n_periods * n_treatments))\n", + "target_policy = np.ones((1, n_periods * n_treatments))\n", + "eff = est.effect(T0=baseline_policy, T1=target_policy)\n", + "print(f\"Effect of target policy over baseline policy: {eff[0]:0.2f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Marginal effect of a treatments in period 1 on period 3 outcome: [0.04000235 0.0701606 ]\n", + "Marginal effect of a treatments in period 2 on period 3 outcome: [0.31611764 0.23714736]\n", + "Marginal effect of a treatments in period 3 on period 3 outcome: [0.13108411 0.60656886]\n" + ] + } + ], + "source": [ + "# Period treatment effects + interpretation\n", + "for i, theta in enumerate(est.intercept_.reshape(-1, n_treatments)):\n", + " print(f\"Marginal effect of a treatments in period {i+1} on period {n_periods} outcome: {theta}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Coefficient Results: X is None, please call intercept_inference to learn the constant!\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept|(T0)$_0$ 0.04 0.041 0.977 0.328 -0.027 0.107
cate_intercept|(T1)$_0$ 0.07 0.04 1.74 0.082 0.004 0.136
cate_intercept|(T0)$_1$ 0.316 0.036 8.848 0.0 0.257 0.375
cate_intercept|(T1)$_1$ 0.237 0.036 6.608 0.0 0.178 0.296
cate_intercept|(T0)$_2$ 0.131 0.003 45.665 0.0 0.126 0.136
cate_intercept|(T1)$_2$ 0.607 0.003 210.244 0.0 0.602 0.611


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " CATE Intercept Results \n", + "==============================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "------------------------------------------------------------------------------\n", + "cate_intercept|(T0)$_0$ 0.04 0.041 0.977 0.328 -0.027 0.107\n", + "cate_intercept|(T1)$_0$ 0.07 0.04 1.74 0.082 0.004 0.136\n", + "cate_intercept|(T0)$_1$ 0.316 0.036 8.848 0.0 0.257 0.375\n", + "cate_intercept|(T1)$_1$ 0.237 0.036 6.608 0.0 0.178 0.296\n", + "cate_intercept|(T0)$_2$ 0.131 0.003 45.665 0.0 0.126 0.136\n", + "cate_intercept|(T1)$_2$ 0.607 0.003 210.244 0.0 0.602 0.611\n", + "------------------------------------------------------------------------------\n", + "\n", + "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", + "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", + "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", + "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", + "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", + "\"\"\"" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Period treatment effects with confidence intervals\n", + "est.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "conf_ints = est.intercept__interval(alpha=0.05)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1.3 Performance Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Some plotting boilerplate code\n", + "plt.figure(figsize=(15, 5))\n", + "plt.errorbar(np.arange(n_periods*n_treatments)-.04, est.intercept_, yerr=(conf_ints[1] - est.intercept_,\n", + " est.intercept_ - conf_ints[0]), fmt='o', label='DynamicDML')\n", + "plt.errorbar(np.arange(n_periods*n_treatments), true_effect.flatten(), fmt='o', alpha=.6, label='Ground truth')\n", + "for t in np.arange(1, n_periods):\n", + " plt.axvline(x=t * n_treatments - .5, linestyle='--', alpha=.4)\n", + "plt.xticks([t * n_treatments - .5 + n_treatments/2 for t in range(n_periods)],\n", + " [\"$\\\\theta_{}$\".format(t) for t in range(n_periods)])\n", + "plt.gca().set_xlim([-.5, n_periods*n_treatments - .5])\n", + "plt.ylabel(\"Effect\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2. Example Usage with Heterogeneous Treatment Effects on Time-Invariant Unit Characteristics\n", + "\n", + "We can also estimate treatment effect heterogeneity with respect to the value of some subset of features $X$ in the initial period. Heterogeneity is currently only supported with respect to such initial state features. This for instance can support heterogeneity with respect to time-invariant unit characteristics. In that case you can simply pass as $X$ a repetition of some unit features that stay constant in all periods. You can also pass time-varying features, and their time varying component will be used as a time-varying control. However, heterogeneity will only be estimated with respect to the initial state." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.1 DGP" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "# Define additional DGP parameters\n", + "het_strength = .5\n", + "het_inds = np.arange(n_x - n_treatments, n_x)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate data\n", + "dgp = DynamicPanelDGP(n_periods, n_treatments, n_x).create_instance(\n", + " s_x, hetero_strength=het_strength, hetero_inds=het_inds, random_seed=12)\n", + "Y, T, X, W, groups = dgp.observational_data(n_panels, s_t=s_t, random_seed=1)\n", + "ate_effect = dgp.true_effect\n", + "het_effect = dgp.true_hetero_effect[:, het_inds + 1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.2 Train Estimator" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "est = DynamicDML(\n", + " model_y=LassoCV(cv=3), \n", + " model_t=MultiTaskLassoCV(cv=3), \n", + " cv=3)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "est.fit(Y, T, X=X, W=W, groups=groups, inference=\"auto\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
Coefficient Results
point_estimate stderr zstat pvalue ci_lower ci_upper
X0|(T0)$_0$ 0.009 0.045 0.203 0.839 -0.065 0.083
X0|(T1)$_0$ 0.017 0.042 0.416 0.677 -0.051 0.086
X0|(T0)$_1$ -0.001 0.041 -0.035 0.972 -0.069 0.067
X0|(T1)$_1$ -0.031 0.041 -0.76 0.447 -0.099 0.036
X0|(T0)$_2$ -0.306 0.008 -36.667 0.0 -0.32 -0.292
X0|(T1)$_2$ 0.158 0.008 19.656 0.0 0.145 0.171
X1|(T0)$_0$ 0.017 0.044 0.378 0.706 -0.056 0.09
X1|(T1)$_0$ -0.007 0.045 -0.164 0.87 -0.082 0.067
X1|(T0)$_1$ -0.034 0.042 -0.821 0.412 -0.103 0.034
X1|(T1)$_1$ -0.025 0.042 -0.6 0.549 -0.095 0.044
X1|(T0)$_2$ -0.302 0.008 -35.72 0.0 -0.316 -0.288
X1|(T1)$_2$ 0.156 0.008 18.801 0.0 0.142 0.169
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "\n", + " \n", + "\n", + "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept|(T0)$_0$ 0.024 0.036 0.653 0.513 -0.036 0.084
cate_intercept|(T1)$_0$ -0.033 0.036 -0.929 0.353 -0.092 0.025
cate_intercept|(T0)$_1$ -0.105 0.034 -3.067 0.002 -0.162 -0.049
cate_intercept|(T1)$_1$ 0.037 0.034 1.079 0.281 -0.019 0.093
cate_intercept|(T0)$_2$ -0.743 0.005 -140.503 0.0 -0.752 -0.734
cate_intercept|(T1)$_2$ 0.48 0.005 91.061 0.0 0.472 0.489


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" + ], + "text/plain": [ + "\n", + "\"\"\"\n", + " Coefficient Results \n", + "==================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "------------------------------------------------------------------\n", + "X0|(T0)$_0$ 0.009 0.045 0.203 0.839 -0.065 0.083\n", + "X0|(T1)$_0$ 0.017 0.042 0.416 0.677 -0.051 0.086\n", + "X0|(T0)$_1$ -0.001 0.041 -0.035 0.972 -0.069 0.067\n", + "X0|(T1)$_1$ -0.031 0.041 -0.76 0.447 -0.099 0.036\n", + "X0|(T0)$_2$ -0.306 0.008 -36.667 0.0 -0.32 -0.292\n", + "X0|(T1)$_2$ 0.158 0.008 19.656 0.0 0.145 0.171\n", + "X1|(T0)$_0$ 0.017 0.044 0.378 0.706 -0.056 0.09\n", + "X1|(T1)$_0$ -0.007 0.045 -0.164 0.87 -0.082 0.067\n", + "X1|(T0)$_1$ -0.034 0.042 -0.821 0.412 -0.103 0.034\n", + "X1|(T1)$_1$ -0.025 0.042 -0.6 0.549 -0.095 0.044\n", + "X1|(T0)$_2$ -0.302 0.008 -35.72 0.0 -0.316 -0.288\n", + "X1|(T1)$_2$ 0.156 0.008 18.801 0.0 0.142 0.169\n", + " CATE Intercept Results \n", + "===============================================================================\n", + " point_estimate stderr zstat pvalue ci_lower ci_upper\n", + "-------------------------------------------------------------------------------\n", + "cate_intercept|(T0)$_0$ 0.024 0.036 0.653 0.513 -0.036 0.084\n", + "cate_intercept|(T1)$_0$ -0.033 0.036 -0.929 0.353 -0.092 0.025\n", + "cate_intercept|(T0)$_1$ -0.105 0.034 -3.067 0.002 -0.162 -0.049\n", + "cate_intercept|(T1)$_1$ 0.037 0.034 1.079 0.281 -0.019 0.093\n", + "cate_intercept|(T0)$_2$ -0.743 0.005 -140.503 0.0 -0.752 -0.734\n", + "cate_intercept|(T1)$_2$ 0.48 0.005 91.061 0.0 0.472 0.489\n", + "-------------------------------------------------------------------------------\n", + "\n", + "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", + "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", + "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", + "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", + "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", + "\"\"\"" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "est.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Average effect of default policy:-0.42\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "A scalar was specified but there are multiple treatments; the same value will be used for each treatment. Consider specifyingall treatments, or using the const_marginal_effect method.\n", + "A scalar was specified but there are multiple treatments; the same value will be used for each treatment. Consider specifyingall treatments, or using the const_marginal_effect method.\n" + ] + } + ], + "source": [ + "# Average treatment effect for test points\n", + "X_test = X[np.arange(0, 25, 3)]\n", + "print(f\"Average effect of default policy:{est.ate(X=X_test):0.2f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Effect of target policy over baseline policy for test set:\n", + " [-0.37368525 -0.30896804 -0.43030363 -0.52252401 -0.42849622 -0.48790877\n", + " -0.34417987 -0.51804937 -0.36806744]\n" + ] + } + ], + "source": [ + "# Effect of target policy over baseline policy\n", + "# Must specify a treatment for each period\n", + "baseline_policy = np.zeros((1, n_periods * n_treatments))\n", + "target_policy = np.ones((1, n_periods * n_treatments))\n", + "eff = est.effect(X=X_test, T0=baseline_policy, T1=target_policy)\n", + "print(\"Effect of target policy over baseline policy for test set:\\n\", eff)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 0.02374269, -0.03302781, -0.10526464, 0.03675719, -0.74294675,\n", + " 0.48025068]),\n", + " array([[ 0.00914226, 0.01675409],\n", + " [ 0.01732804, -0.00741467],\n", + " [-0.00143705, -0.03431712],\n", + " [-0.03136295, -0.02536834],\n", + " [-0.30581311, -0.30189654],\n", + " [ 0.15773252, 0.15564665]]))" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Coefficients: intercept is of shape n_treatments*n_periods\n", + "# coef_ is of shape (n_treatments*n_periods, n_hetero_inds).\n", + "# first n_treatment rows are from first period, next n_treatment\n", + "# from second period, etc.\n", + "est.intercept_, est.coef_" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "# Confidence intervals\n", + "conf_ints_intercept = est.intercept__interval(alpha=0.05)\n", + "conf_ints_coef = est.coef__interval(alpha=0.05)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.3 Performance Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "# parse true parameters in array of shape (n_treatments*n_periods, 1 + n_hetero_inds)\n", + "# first column is the intercept\n", + "true_effect_inds = []\n", + "for t in range(n_treatments):\n", + " true_effect_inds += [t * (1 + n_x)] + (list(t * (1 + n_x) + 1 + het_inds) if len(het_inds)>0 else [])\n", + "true_effect_params = dgp.true_hetero_effect[:, true_effect_inds]\n", + "true_effect_params = true_effect_params.reshape((n_treatments*n_periods, 1 + het_inds.shape[0]))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# concatenating intercept and coef_\n", + "param_hat = np.hstack([est.intercept_.reshape(-1, 1), est.coef_])\n", + "lower = np.hstack([conf_ints_intercept[0].reshape(-1, 1), conf_ints_coef[0]])\n", + "upper = np.hstack([conf_ints_intercept[1].reshape(-1, 1), conf_ints_coef[1]])" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(15, 5))\n", + "plt.errorbar(np.arange(n_periods * (len(het_inds) + 1) * n_treatments),\n", + " true_effect_params.flatten(), fmt='*', label='Ground Truth')\n", + "plt.errorbar(np.arange(n_periods * (len(het_inds) + 1) * n_treatments),\n", + " param_hat.flatten(), yerr=((upper - param_hat).flatten(),\n", + " (param_hat - lower).flatten()), fmt='o', label='DynamicDML')\n", + "add_vlines(n_periods, n_treatments, het_inds)\n", + "plt.legend()\n", + "plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/setup.cfg b/setup.cfg index 266aa9d16..0d9f2db65 100644 --- a/setup.cfg +++ b/setup.cfg @@ -82,6 +82,7 @@ exclude = [options.package_data] ; include all CSV files as data * = *.csv + *.jbl ; coverage configuration [coverage:run]