Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDP variance feature importance #758

Merged

Conversation

RobertSamoilescu
Copy link
Collaborator

This PR implements the method described here to compute feature importance and feature interactions.

The main idea for the computation of both measures is to analyse the variance of the partial dependence plots, thus the implementation relies on the PartialDependece explainer.

It also adds plotting functionalities for the feature importance and feature interactions. Furthermore, since it is recommended to analyse the results in tandem with the partial dependence plots, the explanation object is designed to be compatible with the pd_plot functionality.

@codecov
Copy link

codecov bot commented Sep 14, 2022

Codecov Report

Merging #758 (ca3067b) into master (313c760) will decrease coverage by 0.21%.
The diff coverage is 41.89%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #758      +/-   ##
==========================================
- Coverage   75.72%   75.51%   -0.22%     
==========================================
  Files          72       73       +1     
  Lines        8223     8477     +254     
==========================================
+ Hits         6227     6401     +174     
- Misses       1996     2076      +80     
Flag Coverage Δ
macos-latest-3.10 75.47% <41.50%> (?)
macos-latest-3.10.6 ?
ubuntu-latest-3.10 75.47% <41.50%> (?)
ubuntu-latest-3.10.6 ?
ubuntu-latest-3.7 75.20% <41.26%> (-0.33%) ⬇️
ubuntu-latest-3.8 75.25% <41.26%> (-0.24%) ⬇️
ubuntu-latest-3.9 75.34% <41.26%> (-0.24%) ⬇️
windows-latest-3.9 73.41% <41.26%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
alibi/explainers/pd_variance.py 40.72% <40.72%> (ø)
alibi/api/defaults.py 100.00% <100.00%> (ø)
alibi/explainers/__init__.py 100.00% <100.00%> (ø)
alibi/utils/missing_optional_dependency.py 94.44% <0.00%> (+0.15%) ⬆️
alibi/explainers/partial_dependence.py 48.16% <0.00%> (+0.21%) ⬆️
alibi/models/tensorflow/autoencoder.py 100.00% <0.00%> (+43.47%) ⬆️
alibi/models/tensorflow/cfrl_models.py 100.00% <0.00%> (+75.67%) ⬆️

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Robert, looks great!

Haven't reviewed the plotting as you've been thinking of improvements.

`HistGradientBoostingRegressor`, `DecisionTreeRegressor`, `RandomForestRegressor`."""

def __init__(self,
predictor: Union[BaseEstimator, Callable[[np.ndarray], np.ndarray]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you've unified these two types of models into a single implementation since it's easy to select the right PD implementation, I'm a little unsure about the discrepancy between PD that it introduces. On the other hand, it's nice having just one public interface for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can discuss offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a good approach now.

alibi/explainers/pd_variance.py Outdated Show resolved Hide resolved
alibi/explainers/pd_variance.py Outdated Show resolved Hide resolved
alibi/explainers/pd_variance.py Show resolved Hide resolved
Comment on lines +80 to +84
self.pd_explainer = PartialDependenceClass(predictor=predictor,
feature_names=feature_names,
categorical_names=categorical_names,
target_names=target_names,
verbose=verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if we should catch exceptions from this initialization and re-raise with more specific exceptions? Or would all exceptions raised from the underlying class be clear enough for a user of this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the exception are clear enough in the PD class.

alibi/explainers/pd_variance.py Show resolved Hide resolved

Returns
-------
An array of size `F x T x N1 x ... N(k-1)`, where `F` is the number of explained features, `T` is the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why it goes up to N(k-1) not N(k).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we compute the variance along the Nk and thus will be removed. For a 2D matrix, if we compute the variance along the second axis, then the result will be 1D.

alibi/explainers/pd_variance.py Show resolved Hide resolved
params.update({'kind': Kind.AVERAGE.value}) # type: ignore[dict-item]

# compute partial dependence for each feature
pd_explanation = self.pd_explainer.explain(**params) # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question on user experience - the verbose flag is passed through to the underlying explainer. I assume the calculation after we have pd_explanation is comparatively quick so it's fine to leave this as is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be addressed somehow. One way to do it is to add a new progress bar for the computation of the variance with an appropriate description. Unfortunately, the PD progress bar does not have any description. Not sure if it is best to add a description for the PD in this PR, or add a a progress bar without description for PDV and later, another PR to add a description for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind where the PR's go too much. Would the proposal be to have 2 progress bars showing for this method or disabling the internal PD progress bar? Where is most of the time spent during the computation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most time spent during computation is in the PartialDependence. The other is relatively fast. I proposal would be to have 2 progress bars for this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then perhaps easiest and simplest (for the user) to stick with the internal progress bar? Or perhaps we add the 2nd progress bar (with a different description), but disable the internal one?

pd_values=pd_explanation.data['pd_values']).T,
}

def _compute_feature_interaction(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me if this all works fine for mixed data, i.e. (num, cat)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think I understand exactly what you mean. There shouldn't be any mixed data since the feature importance is defined for a single feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is feature interaction so 2 features? How does it work for mixed features?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad ... the feature interaction for mixed features works the same: fix one feature, let the other vary, and take the variance. If the fixed feature is numerical and the one that varies is categorical, then the standard deviation computation uses the range statistics. On the other hand, if the categorical if fixed and the numerical one varies, then the standard deviation computation uses the unbiased definition of the std ( divided by N-1). There shouldn't be any problem, and as yout can see, the method uses the internal _compute_pd_variance which is aware of the feature type.

CHANGELOG.md Outdated Show resolved Hide resolved
pd_values=pd_explanation.data['pd_values']).T,
}

def _compute_feature_interaction(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is feature interaction so 2 features? How does it work for mixed features?

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Don't really have many comments regarding the code, also enjoyed the examples and method description, left quite a few minor comments there for improving the text and presentation.

I'm noting that the new plotting functions have an impact on the test coverage so as follow-up we should start addressing those as part of #760.

params.update({'kind': Kind.AVERAGE.value}) # type: ignore[dict-item]

# compute partial dependence for each feature
pd_explanation = self.pd_explainer.explain(**params) # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind where the PR's go too much. Would the proposal be to have 2 progress bars showing for this method or disabling the internal PD progress bar? Where is most of the time spent during the computation?

@@ -14,6 +14,8 @@ The following table summarizes the capabilities of the current algorithms:
|Method|Models|Exp. types|Classification|Regression|Tabular|Text|Image|Cat. data|Train|Dist.|
|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---|:---:|
|[ALE](../methods/ALE.ipynb)|BB|global|✔|✔|✔| | | |✔| |
|[PartialDependence](../methods/PartialDependence.ipynb)|BB WB|global|✔|✔|✔| | |✔|✔| |
|[PartialDependenceVariance](../methods/PartialDependenceVariance.ipynb)|BB WB|global|✔|✔|✔| | |✔|✔| |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately with this line the content of the page extends beyond the formatted page:
image
One fix would be to increase the width of the white part in the css style page. Another I would suggest to shorten the names of the explainers, e.g. PartialDependence -> Partial Dependence and PartialDepencenceVariance -> PD Variance since the names in this table are descriptive, not exact class names.

doc/source/overview/high_level.md Outdated Show resolved Hide resolved
doc/source/overview/high_level.md Outdated Show resolved Hide resolved
doc/source/overview/high_level.md Outdated Show resolved Hide resolved
doc/source/overview/high_level.md Outdated Show resolved Hide resolved
`HistGradientBoostingRegressor`, `DecisionTreeRegressor`, `RandomForestRegressor`."""

def __init__(self,
predictor: Union[BaseEstimator, Callable[[np.ndarray], np.ndarray]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a good approach now.

alibi/explainers/pd_variance.py Show resolved Hide resolved
alibi/explainers/pd_variance.py Outdated Show resolved Hide resolved
@@ -0,0 +1,650 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the the -> the (can you search the doc as I think there's a few places this happens)


Reply via ReviewNB

@@ -0,0 +1,650 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks for explain arguments still required.


Reply via ReviewNB

Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a couple of comments for the notebooks, otherwise all looks done.

@RobertSamoilescu RobertSamoilescu merged commit 6a46fa7 into SeldonIO:master Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants