ART-AttentiveFP-CI

Project summary

This repository contains the code and data for the study "Machine Learning Model for Catalytic Asymmetric Reactions of Simple Alkenes: From Model to Chemical Insights." The project utilizes a manually curated dataset of asymmetric transformations of alkenes (ART: AlkeneReactionTriad) from literature to train a deep learning model focused on predicting reaction outcomes, particularly enantioselectivity. These reactions are crucial for catalytic enantioselective transformations of alkenes, yielding important building blocks such as cyclopropanes, aziridines, and arylated alkenes. Using this dataset various machine learning (ML) models are developed using different featurization techniques, including one-hot encoding, molecular fingerprints, SMILES, and molecular graphs. We used Optuna for hyper-parameter tuning for these ML models.

Data

There are 376 reactions in total, each varying by the type of reacting partner, including alkene, chiral ligand, and substrate. The 'ART_ind.csv' file includes reaction examples with SMILES strings for each component along with the corresponding enantiomeric excess (ee). The 'ART_30_splits.xlsx' file contains data for 30 different splits.

Environmental Setup

conda env create -f environment.yml
conda activate ART-AttentiveFP-CI
pip install dgl-cu110
pip install dgllife==0.2.8
pip install optuna
pip install rdkit

Demo & Instructions for use

Notebook1 showcases the training of a deep neural network (DNN) model using fingerprint techniques and Optuna for hyperparameter tuning.

Notebook2 illustrates the training of a DNN model using one-hot encoding and Optuna for hyperparameter tuning.

Notebook3 presents the training of Random Forest, SVM, Decision Tree, and Gradient Boosting models using one-hot encoding and Optuna for hyperparameter tuning.

Notebook4 details the training of the AttentiveFP model using the AttentiveFPAtomFeaturizer, which includes one-hot encodings for atom type, degree, hybridization, formal charge, and other relevant properties.

Notebook5 covers the training of the AttentiveFP-CI model, also utilizing the AttentiveFPAtomFeaturizer, but with a different approach to handling class imbalance in the loss function.

It is important to highlight that Optuna is utilized for hyperparameter tuning to find the most promising hyperparameter sets for all these ML models.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
dataset		dataset
ART_30_splits.xlsx		ART_30_splits.xlsx
ART_ind.csv		ART_ind.csv
ART_main.csv		ART_main.csv
Dataset		Dataset
LICENSE		LICENSE
Notebook1_Regression_Optuna_DNN_Fingerprint.ipynb		Notebook1_Regression_Optuna_DNN_Fingerprint.ipynb
Notebook2_Regression_Optuna_DNN_OHE.ipynb		Notebook2_Regression_Optuna_DNN_OHE.ipynb
Notebook3_Regression_Optuna_ML_Fingerprint.ipynb		Notebook3_Regression_Optuna_ML_Fingerprint.ipynb
Notebook4_Regression_Optuna_AttentiveFP.ipynb		Notebook4_Regression_Optuna_AttentiveFP.ipynb
Notebook5_Regression_Optuna_AttentiveFP_CI.ipynb		Notebook5_Regression_Optuna_AttentiveFP_CI.ipynb
Readme.md		Readme.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ART-AttentiveFP-CI

Project summary

Data

Environmental Setup

Demo & Instructions for use

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

alhqlearn/ART-AttentiveFP-CI

Folders and files

Latest commit

History

Repository files navigation

ART-AttentiveFP-CI

Project summary

Data

Environmental Setup

Demo & Instructions for use

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages