Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added validation methods #290

Closed
Thomas9292 opened this issue Jan 26, 2021 · 5 comments
Closed

Added validation methods #290

Thomas9292 opened this issue Jan 26, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Thomas9292
Copy link

Even though there are several ways of validating results (of metalearners) described in the documentation, it's still complicated to estimate how trustworthy the results are.

My question is, would it be possible to perform some sort of hold out validation for the outcome variable. Since, intuitively, metalearners come to a treatment effect estimate by predicting the outcome for both treatment options, it should be possible to predict the outcome for a holdout set. In my understanding, it should then be possible to apply traditional accuracy metrics to evaluate the model.

This will of course not give any insight into the accuracy of the predictions for unobserved outcomes, but it will allow for more confidence in the model if the observed predictions are are at least somewhat accurate.

What do you think of this method? Would it be worth implementing this somehow?

@Thomas9292 Thomas9292 added the enhancement New feature or request label Jan 26, 2021
@ppstacy
Copy link
Collaborator

ppstacy commented Jan 26, 2021

Hi @Thomas9292 thanks for using CausalML. I think it definitely makes sense to do the validation. To serve this purpose in our example notebook for meta-learner here Part-B, we did the validation and model performance comparison on a 20% hold-out dataset for different evaluation metrics (e.g., MSE, KL Divergence, AUUC). Please take a look and let me know if you have any questions.

@baendigreydo
Copy link

Hi all, thank you for sharing the CausalML package!
I am a complete beginner to coding, trying to finish my thesis, excusing already for a noob question.:
Regarding this Part-B of the example notebook. How am I able to calculate these summary metrics (Abs % Error of ATE | MSE | KL Divergence) for non-synthetic data (e.g. Hillstrom Data). I got lost at this point, unfortunately.
Plus, is there any way to calculate Qini Values or plot gain/lift/qini curves from UpliftTrees? An answer to that would help me tremendously!

@ppstacy
Copy link
Collaborator

ppstacy commented Feb 10, 2021

Hi @baendigreydo, to calculate those metrics for non-synthetic data unfortunately we don't have the functions now to let you directly use them, but you can reference the code here to generate yourself.

As shown in this notebook you can calculate and plot gain/lift/qini. Please let us know if you have any questions.

@Thomas9292
Copy link
Author

Thomas9292 commented Feb 11, 2021

Hi @baendigreydo, let me illustrate to you what I did for reference. Also really curious to hear from @ppstacy if this is the way you think it could be implemented/should be done. The problem is that for the non-synthetic data, only the observed treatment is known. So I did some masking to only compare the treatment for those values.

# Create holdout set
X_train, X_test, t_train, t_test, y_train, y_test_actual = train_test_split(df_confounder, df_treatment, target, test_size=0.2)

# Fit learner on training set
learner = XGBTRegressor()
learner.fit(X=X_train, treatment=t_train, y=y_train)

# Predict the TE for test, and request the components (predictions for t=1 and t=0)
te_test_preds, yhat_c, yhat_t = learner.predict(X_test, t_test, return_components=True)

# Mask the yhats to correspond with the observed treatment (we can only test accuracy for those)
yhat_c = yhat_c[1] * (1 - t_test)
yhat_t = yhat_t[1] * t_test
yhat_test = yhat_t + yhat_c

# Model prediction error
MSE = mean_squared_error(y_test_actual, yhat_test)
print(f"{'Model MSE:':25}{MSE}")

# Also plotted actuals vs. predictions in here, will spare you the code

@baendigreydo
Copy link

Thanks for your answers.
I was able to generate the plots for the meta learners and Trees easily.

I also looked into the solution proposed by you @Thomas9292.
I think this is a usable workaround method but the I think there is a logical fault when validation the meta-learner accuracy: to calculate summary tables like shown in this notebook Part B, it uses the function "get_synthetic_preds_holdout" within "get_synthetic_summary_holdout". For these functions to work, the actual treatment effects "tau" are necessary which can be generated within "synthetic_data". As "tau" is not known in real world data, one has to estimate it, just like you @Thomas9292 did it (named "te_test_preds" for test set, respectively "te_train_preds" for train set, to be inserted here ) Based on these new "preds_dict_trains[KEY ACTUAL] = te_train_preds" and "preds_dict_valid[KEY_ACTUAL] = te_test_preds", the summary table can be calculated. The problem here, however, is that these "taus" are assumed to be the ground truth and further models are compared against the model used to generate the "taus", effectively a second order comparison.

I am happy if someone can confirm or even better, refute this issue, as I am puzzleheaded right now.

@Thomas9292 was MSE the only metric you used for model selection? What about Gain/Qini? Would also love to see your code for the plots as I still have a lot to learn.

@ppstacy When I did the above calculation I noticed the following. I suspect that the function "regression_metrics" is not yet complete, since no return is specified. I think it should look like this
calculating regression metrics
` reg_metrics=[]

for name, func in metrics.items():
    if w is not None:
        assert y.shape[0] == w.shape[0]
        if w.dtype != bool:
            w = w == 1
        logger.info('{:>8s}   (Control): {:10.4f}'.format(name, func(y[~w], p[~w])))
        logger.info('{:>8s} (Treatment): {:10.4f}'.format(name, func(y[w], p[w])))
    else:
        logger.info('{:>8s}: {:10.4f}'.format(name, func(y, p)))
    reg_metrics.append({name:func(y,p)})

return np.array(reg_metrics)

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants