The White Box Project is a project that introduces many ways to solve the part of the black box of machine learning. This project is based on Interpretable Machine Learning by Christoph Molnar [1]. I recommend you to read the book first and practice this project. If you are R user, you can see R code used in examples here.
한글로 번역된 내용은 여기서 확인하실 수 있습니다. 번역은 저자와 협의 후 진행되었음을 알립니다.
만약 번역본에 잘못된 해석이 있다면 wogur379@gmail.com 또는 issue에 남겨주세요. 감사합니다.
The goal is to analysis various data into black box models and to build a pipeline of analysis reports using interpretable methods.
numpy == 1.17.3
scikit-learn == 0.21.2
xgboost == 0.90
tensorflow == 1.14.0
- Titanic: Machine Learning from Disaster (Classification) [2]
- Cervical Cancer (Classification) [3]
- House Prices: Advanced Regression Techniques (Regression) [4]
- Bike Sharing (Regression) [5]
- Youtube Spam (Classification & NLP) [6]
The parameters used to learn the model can be found here.
- Random Forest (RF)
- XGboost (XGB)
- LigthGBM (LGB)
- Deep Neural Network (DNN)
Model-specific methods [ English , Korean ]
- Linear Regression [ English , Korean ]
- Logistic Regression [ English , Korean ]
- GLM, GAM and more [ English , Korean ]
- Decision Tree [ English , Korean ]
- Decision Rules [ English , Korean ]
- RuleFit [ English , Korean ]
- Other Interpretable Models [ English , Korean ]
Model-agnostic methods [ English , Korean ]
- Partial Dependence Plot (PDP) [ English , Korean ]
- Individual Conditional Expectation (ICE) [ English , Korean ]
- Accumulated Local Effects (ALE) Plot [ English , Korean ]
- Feature Interaction [ English , Korean ]
- Permutation Feature Importance [ English , Korean ]
- Global Surrogate [ English , Korean ]
- Local Surrogate (LIME) [ English , Korean ]
- Scoped Rules (Anchors) [ English , Korean ]
- Shapley Values [ English , Korean ]
- SHAP (SHapley Additive exPlanations) [ English , Korean ]
Interpretable Models
Name | Packages |
---|---|
Linear Regression | scikit-learn statsmodels |
Logistic Regression | scikit-learn statsmodels |
Ridge Regression | scikit-learn statsmodels |
Lasso Regression | scikit-learn statsmodels |
Generalized Linear Model (GLM) | statsmodels |
Generalized Additive Model (GAM) | statsmodels pyGAM |
Decision Tree | scikit-learn |
Baysian Rule Lists | skater |
RuleFit | rulefit |
Skope-rules | skope-rules |
Model-Agnostic Methods
Name | Packages |
---|---|
Partial Dependence Plot (PDP) | skater scikit-learn |
Individual Conditional Expectation (ICE) Plot | PyCEbox |
Feature Importance | skater |
Local Surrogate | skater lime |
Global Surrogate | skater |
Scoped Rules (Anchors) | alibi |
SHapley Additive exPlanation (SHAP) | shap |
Example-Based Explanations
Name | Packages |
---|---|
Contrastive Explanations Method (CEM) | alibi |
Counterfactual Instances | alibi |
Prototype Counterfactuals | MMD-critic |
Influence Instances | influence-release |
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us page for a list of core contributors.
It is currently maintained by a team of volunteers.
Scikit-learn is available in through conda provided by Anaconda.
- Documentation : https://scikit-learn.org/stable/
- Github Repository : https://github.com/scikit-learn/scikit-learn
Installation
# Pip
pip install -U scikit-learn
# Conda
onda install scikit-learn
import sklearn
statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.
Statsmodels is available in through conda provided by Anaconda.
- Documentation : https://www.statsmodels.org/stable/index.html
- Github Repository : https://github.com/statsmodels/statsmodels
Installation
# Pip
pip install statsmodels
# Conda
conda install -c conda-forge statsmodels
import statsmodels
pyGAM is a package for building Generalized Additive Models in Python, with an emphasis on modularity and performance. The API will be immediately familiar to anyone with experience of scikit-learn or scipy.
- Documentation : https://pygam.readthedocs.io/en/latest/notebooks/quick_start.html
- Github Repository : https://github.com/dswah/pyGAM
Installation
# Pip
pip install pygam
# Conda
conda install -c conda-forge pygam
import pygam
Skater is a open source unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases. Skater supports algorithms to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction).
- Documentation : https://oracle.github.io/Skater/index.html
- Github Repository : https://github.com/oracle/Skater
Installation
# Option 1: without rule lists and without deepinterpreter
pip install -U skater
# Option 2: without rule lists and with deepinterpreter:
pip3 install --upgrade tensorflow
sudo pip install keras
pip install -U skater
# Option 3: For everything included
conda install gxx_linux-64
pip3 install --upgrade tensorflow
sudo pip install keras
sudo pip install -U --no-deps --force-reinstall --install-option="--rl=True" skater==1.1.1b1
# Conda
conda install -c conda-forge Skater
import skater
python partial dependence plot toolbox
This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm using partial dependence plots R1 R2. PDPbox now supports all scikit-learn algorithms.
- Documentation : https://pdpbox.readthedocs.io/en/latest/index.html#
- Github Repository : https://github.com/SauceCat/PDPbox
Installation
# Pip
pip install pdpbox
import pdpbox
This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations). Lime is based on the work presented in this paper (bibtex here for citation).
- Documentation : https://lime-ml.readthedocs.io/en/latest/index.html
- Github Repository : https://github.com/marcotcr/lime
Installation
# Pip
pip install lime
import lime
A Python implementation of individual conditional expecation plots inspired by R's ICEbox. Individual conditional expectation plots were introduced in Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (arXiv:1309.6392).
- Documentation : http://austinrochford.github.io/PyCEbox/docs/
- Github Repository : https://github.com/AustinRochford/PyCEbox
Installation
# Pip
pip install pycebox
import pycebox
Implementation of a rule based prediction algorithm based on [the rulefit algorithm from Friedman and Popescu (PDF)(http://statweb.stanford.edu/~jhf/ftp/RuleFit.pdf)]
- Github Repository : https://github.com/christophM/rulefit
Installation
# Pip
pip install git+git://github.com/christophM/rulefit.git
import rulefit
Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.
Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.
Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.
- Documentation : https://skope-rules.readthedocs.io/en/latest/index.html
- Github Repository : https://github.com/scikit-learn-contrib/skope-rules
Installation
# Pip
pip install skope-rules
import skrules
Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.
- Documentation : https://docs.seldon.io/projects/alibi/en/latest/#
- Github Repository : https://github.com/SeldonIO/alibi
Installation
# Pip
pip install alibi
import alibi
This code replicates the experiments from the following paper:
Pang Wei Koh and Percy Liang Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017.
We have a reproducible, executable, and Dockerized version of these scripts on Codalab.
The datasets for the experiments can also be found at the Codalab link.
- Github Repository : https://github.com/kohpangwei/influence-release
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).
- Documentation : https://shap.readthedocs.io/en/latest/#
- Github Repository : https://github.com/slundberg/shap
Installation
# Pip
pip install shap
# Conda
conda install -c conda-forge shap
import shap
This method is proposed in this papaer.
Abstract
Example-based explanations are widely used in the effort to improve the interpretability of highly complex distributions. However, prototypes alone are rarely
sufficient to represent the gist of the complexity. In order for users to construct
better mental models and understand complex data distributions, we also need
criticism to explain what are not captured by prototypes. Motivated by the Bayesian
model criticism framework, we develop MMD-critic which efficiently learns prototypes and criticism, designed to aid human interpretability. A human subject pilot
study shows that the MMD-critic selects prototypes and criticism that are useful
to facilitate human understanding and reasoning. We also evaluate the prototypes
selected by MMD-critic via a nearest prototype classifier, showing competitive
performance compared to baselines.
- Github Repository : https://github.com/BeenKim/MMD-critic
[1] Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.
[2] Kaggle Competiton : Titanic: Machine Learning from Disaster
[3] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017. [Link]
[4] Kaggle Competition : House Prices: Advanced Regression Techniques
[5] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg. [Link]
[6] Alberto, T.C., Lochter J.V., Almeida, T.A. TubeSpam: Comment Spam Filtering on YouTube. Proceedings of the 14th IEEE International Conference on Machine Learning and Applications (ICMLA'15), 1-6, Miami, FL, USA, December, 2015. [Link]
[7] Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems. 2017. (Korean Version)