-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explainability: new module #44
base: main
Are you sure you want to change the base?
Conversation
JochenSiegWork
commented
Jul 5, 2024
bf9714d
to
3869edb
Compare
cdbece2
to
21a13bf
Compare
9d236b9
to
d6d880e
Compare
- Add proof of concept for Explainer class and explanation data structures to express explanations for feature vectors and molecules. - Add Christian W. Feldmanns visualization code for shap weighted heatmaps of the molecular structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- move to experimental
- only ignorer errors where required.
Overall this is a lot to look through and my brain started to nope out...
I think we can leave it as is and improve later. I still think the text below the explanations are confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry still haven't finished everything. Just submitting a WIP
return feature_matrix | ||
|
||
|
||
def _convert_to_array(value: Any) -> npt.NDArray[np.float64]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use numpy.atleast_1d instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return atom_weights | ||
|
||
|
||
ShapExplanation: TypeAlias = list[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly TypeAlias
is deprecated https://docs.python.org/3/library/typing.html#typing.TypeAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its obvious some varaibles aren't required to be explicitly typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I removed the type and now it's explicitly written as list[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]
. However, I sticked with the list
and not Sequence
type, since we use only lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notebook: After cell [16]:
We hypothesis -> We hypothesi(z/s)e?
raise ValueError("Value is not a scalar or numpy array.") | ||
|
||
|
||
def _get_prediction_function(pipeline: Pipeline | BaseEstimator) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Callable[[npt.Arraylike], npt.Arraylike]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return atom_weights | ||
|
||
|
||
ShapExplanation: TypeAlias = list[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]
return atom_weights | ||
|
||
|
||
ShapExplanation: TypeAlias = list[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its obvious some varaibles aren't required to be explicitly typed.
|
||
# determine type of returned explanation | ||
featurization_element = self.featurization_subpipeline.steps[-1][1] # type: ignore[union-attr] | ||
self.return_element_type_: type[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type hints for class/instance variables should be typed outside of functions. It would be best to do this above the init function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return atom_weights | ||
|
||
|
||
ShapExplanation: TypeAlias = list[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?
import shap | ||
from scipy.sparse import issparse, spmatrix | ||
from sklearn.base import BaseEstimator | ||
from typing_extensions import override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use typing.override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if isinstance(explainer, SHAPTreeExplainer) and isinstance( | ||
estimator, GradientBoostingClassifier | ||
): | ||
# there is currently a bug in SHAP's TreeExplainer for GradientBoostingClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe log a warning or info so me might remember this :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a warning since the current test code will break if they fix the bug. So we'll know when the dependency updates. This should be safely captured by the CI/CD pipeline.
All comments on the notebook are addressed |
- pylint fails to parse f-strings with same quotation marks as the string and in the curly brackets, e.g. f"{"foo"}" but f"{'foo'}" would work.
Callable[[npt.NDArray[np.float64]], npt.NDArray[np.float64]] | ||
] = [] | ||
if function_list is not None: | ||
self.function_list: list[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type hints for class/instance vars should be defined outside of functions.
...
I read a bit into this and it seems that its not actually a PEP, but only done in this way in PEP 526.
The PEP itself ist now also marked as historical document.
But the rest of the molpipeline code is done in the same way and I think that it's still a good Idea.
Current examples still do it this way: https://typing.readthedocs.io/en/latest/spec/class-compat.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there
Now we are waiting for numba Python 3.13 support which is needed for SHAP. The release status can be seen here numba/numba#9896 |