Skip to content

kaist-amsg/Synthesizability-stoi-CGNF

Repository files navigation

Synthesizability-stoi-CGNF

Synthesizability-stoi-CGNF is a python code for predicting synthesizability score which is quantitative synthesizability metric of inorganic crystal compositions. This is a partially supervised machine learning protocol (PU-learning) using CGNF(Composition Graph Neural Fingerprint) atomic embedding method developed by prof. Yousung Jung group (contact: yousung.jung@snu.ac.kr).

Developers

Jidon Jang, Juhwan Noh

Prerequisites

Python3
Numpy
Pytorch
Pymatgen

Publication

Jidon Jang, Juhwan Noh, Lan Zhou, Geun Ho Gu, John M. Gregoire, and Yousung Jung, "Synthesizability of materials stoichiometry using semi-supervised learning", Matter, 2024, 7(6), 2294-2312 (DOI: 10.1016/j.matt.2024.05.002)

Usage

[1] Define a customized data format and prepare atomic embedding vector file for generation of CGNF

To input crystal structures to Synthesizability-stoi-CGNF, you will need to define a customized dataset and pre-generate CGNF as pickle files for bootstrap aggregating in semi-supervised learning. Note that this is required for both training and predicting. Following files should be needed to generate CGNF.

id_prop.csv: a CSV file with two columns for positive data(synthesizable) and unlabeled data(not-yet-synthesized). The first column recodes a inorganic composition (The formula string format of Composition class in Pymatgen package is recommended), and the second column recodes the value (1 = positive, 0 = unlabeled) according to whether they were synthesized already or not.

cgcnn_hd_rcut4_nn8.element_embedding.json: a JSON file containing atomic embedding vectors for generation of CGNF

[2] Train a Synthesizability-PU-CGCNN model

python main_PU_learning.py --bag 100 --data id_prop.csv --embedding cgcnn_hd_rcut4_nn8.element_embedding.json --split ./split

Load composition information from 'id_prop.csv' and generate data split files for PU-learning in 'split' folder.
After training, prediction results for test-unlabeled data (csv file) corresponding to each iteration will be generated.
Result of bootstrap aggregating is saved as 'test_results_ensemble_100models.csv'
You can change the number of bootstrap samples using '--bag' option

[3] Predict synthesizability of new crystals with pre-trained models

python predict_PU_learning.py --bag 100 --data id_prop_test.csv --embedding cgcnn_hd_rcut4_nn8.element_embedding.json --modeldir ./models

Load composition information from 'id_prop_test.csv' file for test materials and pre-trained models from 'models' folder.
Predict synthesizability of crystal composition in id_prop_test.csv file using the loaded models.
Result of bootstrap aggregating is saved as 'test_results_ensemble_100models.csv'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages