De novo designs of polymer electrolytes with high conductivities using Generative AIs

Installation

the following installation steps have been tested on macOS 14.6.1 with M1 Max chip

minGPT

Python version: 3.8

Install the required packages:

pip install -r requirements.txt

diffusion1D

Python version: 3.8

Install the required packages denoising_diffusion_pytorch, rdkit, deepchem and transformers:

pip install rdkit deepchem transformers

cd diffusion1D/model
pip install -e .

diffusionLM

Python version: 3.8

Install the required packages diffusionLM, transformers (customized) and others:

pip install mpi4py
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -e diffusionLM/improved-diffusion/ 
pip install -e diffusionLM/transformers/
pip install spacy==3.2.6
pip install datasets==2.0.0 
pip install huggingface_hub==0.16.4
pip install wandb deepchem torchsummary

Dataset

minGPT & diffusion1D

Prepare the data used for training in .csv file with two columns, the separation marker is "\t"

1st column: "mol_smiles" (SMILES code for the monomer)
2nd column: "conductivity" ("1" is high conductivity, "0" is low conductivity)

diffusionLM

The datasets are stored in .json format, please check the diffusionLM/datasets for examples.

Training, generation and evaluation pipeline

data preprocessing (data_config)
build the model (model_config)
train the model (train_config)
generate candidates (generate_config)
evaluation (6 metrics including validity, novelty, uniqueness, synthesizability, similarity and diversity)

Demo

The demos are shown in minGPT_pipeline.ipynb, diffusion1D_pipeline.ipynb, diffusionLM_pipeline.ipynb

minGPT & diffusion1D

For minGPT_pipeline.ipynb, diffusion1D_pipeline.ipynb, all the steps in pipeline can be executed in the notebook.

diffusionLM

For diffusionLM_pipeline.ipynb, the notebook generates the the bash scripts for training and generation. The scripts will be stored under diffusionLM/improved-diffusion.

To run the training:

cd diffusionLM/improved-diffusion
bash train_conditional.sh or bash train_unconditional.sh
The model checkpoints will be stored in ```diffusionLM/improved-diffusion/diffusion_models```

To run the generation:

cd diffusionLM/improved-diffusion
bash generate_conditional.sh or bash generate_unconditional.sh

The generated output will be stored in diffusionLM/improved-diffusion/generation_outputs

Pretrained models

minGPT

The checkpoints of pretrained model at different epochs can be obtained here:https://drive.google.com/drive/folders/1M1VjgUnFDospbmVSnr17JdUcUa-_4O79?usp=sharing. Please put the checkpoints files under minGPT/ckpts/.

diffusion1D

The checkpoints of pretrained model at different epochs can be obtained here: https://drive.google.com/drive/folders/1kFnKtnmuQLTNDZ7BJG2ZhoJKGWoXlI--?usp=sharing. Please put the checkpoints files under diffusion1D/ckpts/.

diffusionLM

The checkpoints of pretrained model at different epochs can be obtained here: https://drive.google.com/drive/folders/1ndLNhRZu8TL2Ni7VL8Q9GRAeX9fFVOq0?usp=sharing. Please put the whole checkpoints folder and files under diffusionLM/improved-diffusion/diffusion_models/.

Reference

The github repositories that are referenced for this code include:

https://github.com/karpathy/minGPT
https://github.com/lucidrains/denoising-diffusion-pytorch
https://github.com/XiangLi1999/Diffusion-LM

In this work, we copied the minGPT model from the original repository by Karpathy at https://github.com/karpathy/minGPT at commit 37baab7 (Jan 8, 2023). This unchanged copy is saved in https://github.com/TRI-AMDD/PolyGen/tree/main/minGPT/model.

Citation

If you use PolyGen, please cite the following:

@article{lei2023self,
  title={A self-improvable Polymer Discovery Framework Based on Conditional Generative Model},
  author={Lei, Xiangyun and Ye, Weike and Yang, Zhenze and Schweigert, Daniel and Kwon, Ha-Kyung and Khajeh, Arash},
  journal={arXiv preprint arXiv:2312.04013},
  year={2023}
}

@article{yang2023novo,
  title={De novo design of polymer electrolytes with high conductivity using gpt-based and diffusion-based generative models},
  author={Yang, Zhenze and Ye, Weike and Lei, Xiangyun and Schweigert, Daniel and Kwon, Ha-Kyung and Khajeh, Arash},
  journal={arXiv preprint arXiv:2312.06470},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github/workflows		.github/workflows
Example-simulation-files		Example-simulation-files
diffusion1D		diffusion1D
diffusionLM		diffusionLM
iterative-polymer-discovery-framework		iterative-polymer-discovery-framework
minGPT		minGPT
LICENSE		LICENSE
PolyGen-train-set-from-HTP-MD.csv		PolyGen-train-set-from-HTP-MD.csv
PolyGen-train-set-from-HTP-MD.numbers		PolyGen-train-set-from-HTP-MD.numbers
README.md		README.md
diffusion1D_pipeline.ipynb		diffusion1D_pipeline.ipynb
diffusionLM_pipeline.ipynb		diffusionLM_pipeline.ipynb
htpmd-trainset.csv		htpmd-trainset.csv
minGPT_pipeline.ipynb		minGPT_pipeline.ipynb
molecule_grid.png		molecule_grid.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

De novo designs of polymer electrolytes with high conductivities using Generative AIs

Installation

the following installation steps have been tested on macOS 14.6.1 with M1 Max chip

minGPT

diffusion1D

diffusionLM

Dataset

minGPT & diffusion1D

diffusionLM

Training, generation and evaluation pipeline

Demo

minGPT & diffusion1D

diffusionLM

Pretrained models

minGPT

diffusion1D

diffusionLM

Reference

Citation

About

Releases 1

Packages

Contributors 5

Languages

License

TRI-AMDD/PolyGen

Folders and files

Latest commit

History

Repository files navigation

De novo designs of polymer electrolytes with high conductivities using Generative AIs

Installation

the following installation steps have been tested on macOS 14.6.1 with M1 Max chip

minGPT

diffusion1D

diffusionLM

Dataset

minGPT & diffusion1D

diffusionLM

Training, generation and evaluation pipeline

Demo

minGPT & diffusion1D

diffusionLM

Pretrained models

minGPT

diffusion1D

diffusionLM

Reference

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages