learn_mof_ox_state

Tools to train and test a voting classifier that predicts oxidation states (of MOFs), for example to replicate our work [1]. If you're just interested in using a pre-trained model, the oximachinerunner package.

⚠️ Warning: You need to export COMET_API_KEY, as the code will look for it if you want to track your experiments (when you retrain the model). If you do not want to do this, remove those lines in the code. You might also want to consider other tracking options such as weights and biases.

Installation

To install the software with all dependencies, you can use

pip install git+https://github.com/kjappelbaum/learn_mof_ox_state.git

The full process should take some seconds.

Note that the models have been fitted using scikit-learn==0.21.3 and therefore one should ideally used this version. For better compatibility with the other dependencies (matminer, apricot) that depend on newer versions of scikit-learn we patched the model by adding the _strategy attribute to the initialization DummyClassifier of the GradientBoostingClassifier and adding the n_samples_fit_ attribute to the KNeighborsClassifier. If you plan to do some further developments, it might be advisable to bump all dependencies before training a new model.

Usage

The functions in this package requires inputs (features and labels) that can be generated with our oximachine_featurizer Python package. The full datasets which can be used to train a model, as well as a pre-trained model are deposited on the MaterialsCloud Archive (doi: 10.24435/materialscloud:2019.0085/v1 ). The analysis command line interfaces can be used to reproduce our findings, based on the data deposited in the MaterialsCloud Archive. The training CLI can for example be used as

  python machine_learn_oxstates/learnmofox/train_ensemble_classifier.py {featurespath} {labelspath} {modelpath} {metricsoutpath} standard soft isotonic 40000 20 none --train_one_fold

Some experiments we ran, together with code and datahash, can also be found at comet.ml
For testing a pre-trained model we recommend using our webapp, for which the code can be found, along with the Docker images, in another GitHub repository. There is also a small Python package, oximachinerunner, that allows to run inference on crystal structures.

File contents

Training

The training can, depending on the training set size, take hours.

train_calibrate_voting_classifier_no_track.py: to run the training without comet.ml
train_calibrate_voting_classifier.py: train a voting classifier (with optimized hyperparameters and track the experiments with comet.ml)
train_ensemble_classifier.py: run the hyperparameter optimization for the ensemble of models
utils.py: contains the custom voting classifier class and some utils

Analysis

The runtime for the tests depends on whether they require retraining the model (permutation significance), which can take several hours, or whether they only involve evaluating the model for some data points, which will take minutes.

feature_importance_cli.py: command-line-tools to calculate feature importance with permutation or SHAP
farm_learning_curves.py: command-line-tool to run learning curves
bias_variance_cli.py: run a bias-variance decomposition analysis with mlxtend
permutation_significance.py: tool to run a permutation significance test (permute label and measure metrics to see if the model learned something meaningful)
run_combinatorial_study.py: train models on different feature subsets
metrics.py contains helper functions to calculate metrics
bootstrapped_metrics.py: functions to calculate a bootstrapped learning curve point
test_model.py: command-line-tool to run some basic tests

Example usage

The use of the main functions of this package is shown in the Jupyter Notebook in the example directory. It contains some example structures and the output, which should be produces in seconds.

References

[1] Jablonka, Kevin Maik; Ongari, Daniele; Moosavi, Seyed Mohamad; Smit, Berend (2020): Using Collective Knowledge to Assign Oxidation States. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.11604129.v1

Name	Name	Last commit message	Last commit date
Latest commit Kevin M. Jablonka docs: updating example notebook Dec 17, 2020 62e0163 · Dec 17, 2020 History 178 Commits
.github	.github	chore: refactoring	Dec 16, 2020
.vscode	.vscode	chore: added changelog	Jul 31, 2020
data	data	chore: completely dropping any previous python 2 support imports	Jul 31, 2020
docs	docs	chore: refactoring	Dec 16, 2020
examples	examples	docs: updating example notebook	Dec 17, 2020
learnmofox	learnmofox	chore: filter warnings in notebook	Dec 16, 2020
test	test	chore: started refactoring	Dec 10, 2020
.gitattributes	.gitattributes	chore: completely dropping any previous python 2 support imports	Jul 31, 2020
.gitignore	.gitignore	feat: patched model, added test, simplified dependencies	Jul 31, 2020
.lgtm.yml	.lgtm.yml	chore: started refactoring	Dec 10, 2020
.pre-commit-config.yaml	.pre-commit-config.yaml	chore: started refactoring	Dec 10, 2020
.pylintrc	.pylintrc	chore: started refactoring	Dec 10, 2020
.readthedocs.yml	.readthedocs.yml	chore: started refactoring	Dec 10, 2020
CHANGELOG.md	CHANGELOG.md	chore: started refactoring	Dec 10, 2020
CONTRIBUTING.md	CONTRIBUTING.md	chore: started refactoring	Dec 10, 2020
LICENSE	LICENSE	chore: started refactoring	Dec 10, 2020
MANIFEST.in	MANIFEST.in	chore: completely dropping any previous python 2 support imports	Jul 31, 2020
README.md	README.md	docs: fix link to app	Dec 16, 2020
requirements.txt	requirements.txt	chore: fix only the major version for most dependencies	Dec 16, 2020
setup.cfg	setup.cfg	chore: started refactoring	Dec 10, 2020
setup.py	setup.py	chore: refactoring	Dec 16, 2020
versioneer.py	versioneer.py	chore: updating number of classes	Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

learn_mof_ox_state

Installation

Usage

File contents

Training

Analysis

Example usage

References

About

Releases

Packages

Contributors 2

Languages

License

kjappelbaum/learn_mof_ox_state

Folders and files

Latest commit

History

Repository files navigation

learn_mof_ox_state

Installation

Usage

File contents

Training

Analysis

Example usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages