-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDP variance feature importance #758
PDP variance feature importance #758
Conversation
b5d271d
to
a66af24
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #758 +/- ##
==========================================
- Coverage 75.72% 75.51% -0.22%
==========================================
Files 72 73 +1
Lines 8223 8477 +254
==========================================
+ Hits 6227 6401 +174
- Misses 1996 2076 +80
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
9ecbbbd
to
c242e83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Robert, looks great!
Haven't reviewed the plotting as you've been thinking of improvements.
`HistGradientBoostingRegressor`, `DecisionTreeRegressor`, `RandomForestRegressor`.""" | ||
|
||
def __init__(self, | ||
predictor: Union[BaseEstimator, Callable[[np.ndarray], np.ndarray]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you've unified these two types of models into a single implementation since it's easy to select the right PD implementation, I'm a little unsure about the discrepancy between PD that it introduces. On the other hand, it's nice having just one public interface for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can discuss offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a good approach now.
self.pd_explainer = PartialDependenceClass(predictor=predictor, | ||
feature_names=feature_names, | ||
categorical_names=categorical_names, | ||
target_names=target_names, | ||
verbose=verbose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if we should catch exceptions from this initialization and re-raise with more specific exceptions? Or would all exceptions raised from the underlying class be clear enough for a user of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the exception are clear enough in the PD class.
|
||
Returns | ||
------- | ||
An array of size `F x T x N1 x ... N(k-1)`, where `F` is the number of explained features, `T` is the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why it goes up to N(k-1)
not N(k)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we compute the variance along the Nk
and thus will be removed. For a 2D matrix, if we compute the variance along the second axis, then the result will be 1D.
params.update({'kind': Kind.AVERAGE.value}) # type: ignore[dict-item] | ||
|
||
# compute partial dependence for each feature | ||
pd_explanation = self.pd_explainer.explain(**params) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question on user experience - the verbose
flag is passed through to the underlying explainer. I assume the calculation after we have pd_explanation
is comparatively quick so it's fine to leave this as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be addressed somehow. One way to do it is to add a new progress bar for the computation of the variance with an appropriate description. Unfortunately, the PD
progress bar does not have any description. Not sure if it is best to add a description for the PD
in this PR, or add a a progress bar without description for PDV and later, another PR to add a description for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind where the PR's go too much. Would the proposal be to have 2 progress bars showing for this method or disabling the internal PD
progress bar? Where is most of the time spent during the computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most time spent during computation is in the PartialDependence
. The other is relatively fast. I proposal would be to have 2 progress bars for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then perhaps easiest and simplest (for the user) to stick with the internal progress bar? Or perhaps we add the 2nd progress bar (with a different description), but disable the internal one?
pd_values=pd_explanation.data['pd_values']).T, | ||
} | ||
|
||
def _compute_feature_interaction(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me if this all works fine for mixed data, i.e. (num, cat)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think I understand exactly what you mean. There shouldn't be any mixed data since the feature importance is defined for a single feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is feature interaction so 2 features? How does it work for mixed features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad ... the feature interaction for mixed features works the same: fix one feature, let the other vary, and take the variance. If the fixed feature is numerical and the one that varies is categorical, then the standard deviation computation uses the range statistics. On the other hand, if the categorical if fixed and the numerical one varies, then the standard deviation computation uses the unbiased definition of the std ( divided by N-1
). There shouldn't be any problem, and as yout can see, the method uses the internal _compute_pd_variance
which is aware of the feature type.
57715ef
to
e41deb6
Compare
…ing functionality. Solved docs errors.
0a174ae
to
7e11875
Compare
90701d3
to
3022518
Compare
pd_values=pd_explanation.data['pd_values']).T, | ||
} | ||
|
||
def _compute_feature_interaction(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is feature interaction so 2 features? How does it work for mixed features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Don't really have many comments regarding the code, also enjoyed the examples and method description, left quite a few minor comments there for improving the text and presentation.
I'm noting that the new plotting functions have an impact on the test coverage so as follow-up we should start addressing those as part of #760.
params.update({'kind': Kind.AVERAGE.value}) # type: ignore[dict-item] | ||
|
||
# compute partial dependence for each feature | ||
pd_explanation = self.pd_explainer.explain(**params) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind where the PR's go too much. Would the proposal be to have 2 progress bars showing for this method or disabling the internal PD
progress bar? Where is most of the time spent during the computation?
doc/source/overview/algorithms.md
Outdated
@@ -14,6 +14,8 @@ The following table summarizes the capabilities of the current algorithms: | |||
|Method|Models|Exp. types|Classification|Regression|Tabular|Text|Image|Cat. data|Train|Dist.| | |||
|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---|:---:| | |||
|[ALE](../methods/ALE.ipynb)|BB|global|✔|✔|✔| | | |✔| | | |||
|[PartialDependence](../methods/PartialDependence.ipynb)|BB WB|global|✔|✔|✔| | |✔|✔| | | |||
|[PartialDependenceVariance](../methods/PartialDependenceVariance.ipynb)|BB WB|global|✔|✔|✔| | |✔|✔| | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately with this line the content of the page extends beyond the formatted page:
One fix would be to increase the width of the white part in the css style page. Another I would suggest to shorten the names of the explainers, e.g. PartialDependence
-> Partial Dependence
and PartialDepencenceVariance
-> PD Variance
since the names in this table are descriptive, not exact class names.
`HistGradientBoostingRegressor`, `DecisionTreeRegressor`, `RandomForestRegressor`.""" | ||
|
||
def __init__(self, | ||
predictor: Union[BaseEstimator, Callable[[np.ndarray], np.ndarray]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a good approach now.
…nation and included it in the plotting functionality.
… importance and interaction.
…ually computed importance using PD.
284886a
to
964e9c3
Compare
@@ -0,0 +1,650 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the the -> the (can you search the doc as I think there's a few places this happens)
Reply via ReviewNB
@@ -0,0 +1,650 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left a couple of comments for the notebooks, otherwise all looks done.
7730899
to
f975340
Compare
f975340
to
ca3067b
Compare
This PR implements the method described here to compute feature importance and feature interactions.
The main idea for the computation of both measures is to analyse the variance of the partial dependence plots, thus the implementation relies on the
PartialDependece
explainer.It also adds plotting functionalities for the feature importance and feature interactions. Furthermore, since it is recommended to analyse the results in tandem with the partial dependence plots, the explanation object is designed to be compatible with the
pd_plot
functionality.