Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explainability: new module #44

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open

Conversation

JochenSiegWork
Copy link
Collaborator

- Add proof of concept for Explainer class and explanation
  data structures to express explanations for feature
  vectors and molecules.
- Add Christian W. Feldmanns visualization code for shap
  weighted heatmaps of the molecular structure.

@JochenSiegWork JochenSiegWork marked this pull request as draft July 5, 2024 09:40
@JochenSiegWork JochenSiegWork self-assigned this Jul 5, 2024
@JochenSiegWork JochenSiegWork force-pushed the explainability_module branch 2 times, most recently from 9d236b9 to d6d880e Compare November 21, 2024 12:24
Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • move to experimental
  • only ignorer errors where required.

Overall this is a lot to look through and my brain started to nope out...
I think we can leave it as is and improve later. I still think the text below the explanations are confusing.

molpipeline/explainability/explainer.py Outdated Show resolved Hide resolved
molpipeline/explainability/visualization/utils.py Outdated Show resolved Hide resolved
tests/test_explainability/test_shap_explainers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry still haven't finished everything. Just submitting a WIP

return feature_matrix


def _convert_to_array(value: Any) -> npt.NDArray[np.float64]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use numpy.atleast_1d instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly TypeAlias is deprecated https://docs.python.org/3/library/typing.html#typing.TypeAlias

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its obvious some varaibles aren't required to be explicitly typed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I removed the type and now it's explicitly written as list[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]. However, I sticked with the list and not Sequence type, since we use only lists.

Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook: After cell [16]:
We hypothesis -> We hypothesi(z/s)e?

raise ValueError("Value is not a scalar or numpy array.")


def _get_prediction_function(pipeline: Pipeline | BaseEstimator) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Callable[[npt.Arraylike], npt.Arraylike]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its obvious some varaibles aren't required to be explicitly typed.


# determine type of returned explanation
featurization_element = self.featurization_subpipeline.steps[-1][1] # type: ignore[union-attr]
self.return_element_type_: type[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints for class/instance variables should be typed outside of functions. It would be best to do this above the init function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

import shap
from scipy.sparse import issparse, spmatrix
from sklearn.base import BaseEstimator
from typing_extensions import override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use typing.override

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if isinstance(explainer, SHAPTreeExplainer) and isinstance(
estimator, GradientBoostingClassifier
):
# there is currently a bug in SHAP's TreeExplainer for GradientBoostingClassifier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log a warning or info so me might remember this :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need a warning since the current test code will break if they fix the bug. So we'll know when the dependency updates. This should be safely captured by the CI/CD pipeline.

@JochenSiegWork
Copy link
Collaborator Author

Notebook: After cell [16]: We hypothesis -> We hypothesi(z/s)e?

All comments on the notebook are addressed

- pylint fails to parse f-strings with same quotation marks
  as the string and in the curly brackets, e.g. f"{"foo"}" but
  f"{'foo'}" would work.
Callable[[npt.NDArray[np.float64]], npt.NDArray[np.float64]]
] = []
if function_list is not None:
self.function_list: list[
Copy link
Collaborator

@c-w-feldmann c-w-feldmann Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints for class/instance vars should be defined outside of functions.
...
I read a bit into this and it seems that its not actually a PEP, but only done in this way in PEP 526.

The PEP itself ist now also marked as historical document.
But the rest of the molpipeline code is done in the same way and I think that it's still a good Idea.
Current examples still do it this way: https://typing.readthedocs.io/en/latest/spec/class-compat.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there

@JochenSiegWork
Copy link
Collaborator Author

Now we are waiting for numba Python 3.13 support which is needed for SHAP. The release status can be seen here numba/numba#9896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants