Skip to content

cmwoodley/GP_qsar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GP_qsar

Wrapper around Sklearn and Gpytorch gaussian process model to enable use in Reinvent as a scoring function and for use in active learning. Use in REINVENT4 is identical to serialised QSARtuna models.

Features:

  • Automates feature selection and hyperparameter tuning/ kernel selection.
  • Produces predictions from SMILES
  • Evaluates single-sample acquisiton functions for use in active learning
  • Evaluates batch selection aquisition functions for use in active leaning

Example config .toml files for use of these models in REINVENT4 are given in /example_config.

Installation

  • Clone this repository
git clone https://github.com/cmwoodley/GP_qsar.git
  • Install GP_qsar
pip install .

Example Usage

from gp_qsar import GP_qsar
import numpy as np

# Toy dataset for simple example

smiles = np.array([
    "CCO", "C1CCCCC1", "O=C=O", "CC(C)C",
    "C1=CC=CC=C1", "CCN(CC)CC", "C1=CC(=O)NC(=O)N1", "CC(C)O",
    "C#N", "C=O", "O=C(O)C", "CC(C)CC",
    "NCCO", "CC(=O)O", "C1CC1", "O=S(=O)(O)O",
    "CNC", "C=CC", "CCOCC", "CCOC"
])

test_smiles = [
    "C1CCOC1",  # Tetrahydrofuran (THF)
    "N#CCN",    # Cyanogen
    "CC(C)CO",  # Isobutanol
    "C1=CC(=O)OC=C1",  # Furan-2(5H)-one
    "C=C",      # Ethene (Ethylene)
]

y = [
    3.14, 2.718, 1.618, 0.577,
    6.022, 9.81, 1.414, 2.302,
    0.693, 4.669, 0.007, 299792.458,
    1.732, 42.0, 0.001, 8.314,
    1.96, 0.333, 0.618, 1.12
]

# Initialise model
model = GP_qsar(smiles, y)

# Generate predictions 
predictions = model.predict_from_smiles(test_smiles)
predictions_std = model.predict_from_smiles(test_smiles, uncert=True) # Generate with uncertainty

# Evaluate acquisition function
UCB = model.evaluate_acquisition_functions(test_smiles, "UCB")

To do

  • Add teach functionality to re-train models with newly acquired datat
  • Actually implement metadata to show model performance
  • Store names of selected features in some meaningful way
  • Improve testing framework
  • Add install option to make gpytorch an optional dependency because cuda is big

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages