Skip to content

Official implementation of "Gradient Boosted and Statistical Feature Selection Pipeline for Materials Property Predictions"

License

Notifications You must be signed in to change notification settings

Songyosk/GBFS4MPPML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GBFS4MPPML

Official implementation of "Gradient Boosted and Statistical Feature Selection Workflow for Materials Property Predictions"

J. Chem. Phys. 159, 194106 (2023)

By S. G. Jung, G. Jung & J. M. Cole

DOI

Introduction

The scripts herein are used to generate the results presented in the aforementioned paper.

Jupyter Notebooks are provided along with the scripts. Their main purpose is to demonstrate the functionalities contained therein.

For each property there are two Jupyter Notebooks:

(i) Featurize

This notebook demonstrates the process of generating the features using various descriptors as mentioned in the corresponding manuscript. The descriptors we use are widely recognised and there are various ways one can generate these features. This step can be skipped if one already has a list of features they wish to use for their chemical data.

(ii) GBFS

This notebook goes through the propose workflow as illustrated by the figure below. The approach we have taken is to use a pre-defined local path, where relevant data are stored and new data files are saved. See the provided Jupyter Notebooks as examples. Each function requires pre-defined parameters, such as the name of target variable, a list of features, type of problem etc.

Data

The data sets are available from: ![Table of Datasets]

Workflow

The overview of the project pipeline: F1

Acknowledgements

J.M.C. conceived the overarching project. S.G.J. and J.M.C. designed the study. S.G.J. developed the workflow, performed the data acquisition and featurization, the statistical analyses, the model pre-training and fine-tuning, and analysed the data under the Ph.D. supervision of J.M.C. G.J. assisted with the data gathering and the development of artificial neural networks for the material-property predictions. S.G.J. drafted the manuscript with assistance from J.M.C. All authors read and approved the final agreed manuscript.

J.M.C. is grateful for the BASF/Royal Academy of Engineering Research Chair in Data-Driven Molecular Engineering of Functional Materials, which is partly sponsored by the Science and Technology Facilities Council (STFC) via the ISIS Neutron and Muon Source; this chair is supported by a PhD studentship (for S.G.J.). STFC is also thanked for a PhD studentship that is sponsored by its Scientific Computing Department (for G.J.).

🔗 Links

portfolio

License

License: MIT

About

Official implementation of "Gradient Boosted and Statistical Feature Selection Pipeline for Materials Property Predictions"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published