Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 953 Bytes

README.md

File metadata and controls

25 lines (17 loc) · 953 Bytes

subSAGE

subSAGE is a Shapley value based framework to infer feature importance in high-dimensional data. It is based on SAGE (Shapley Additive Global importancE), but adjusted for high-dimensional data. We also demonstrate how to perform paired bootstrapping in order to estimate confidence intervals. We investimate in particular subSAGE applied on tree ensemble models. We emphasize the importance of computing subSAGE on independent test data not used during training of the model.

Preprint

Preprint is available here.

Usage

Given an xgboost-model, test data, and a particular feature, the subSAGE estimate can be computed, in R, as:

source("~/subSAGE/subSAGE.R")
t = xgb.model.dt.tree(model = model)
trees = as.data.table(xgboost.trees(xgb_model = model, data = data, recalculate = FALSE))
estimate = subSage_cpp(data,trees,feature,loss = "RMSE")