Skip to content

Improved MCS and Regression to Classification

Compare
Choose a tag to compare
@swansonk14 swansonk14 released this 26 May 21:26
· 12 commits to main since this release
e4737ee

MCS

The maximum common substructure (MCS) similarity function in molecular_similarities.py now accepts additional parameters for modifying the MCS calculation. Specifically, it now allows for match_valences, ring_matches_ring_only, and complete_rings_only (see https://www.rdkit.org/docs/source/rdkit.Chem.MCS.html). These are also accessible via the command line when running chemfunc nearest_neighbor.

Regression to Classification

The regression_to_classification.py script now includes a delete_class_indices flag to delete certain class indices. The primary use case is for building binary classification datasets with a gap between the active and inactive categories. For example, setting thresholds = [0.4, 0.6] and delete_class_indices = {1} will label data < 0.4 as 0 and data >= 0.6 as 1 (originally labeled 2) and will delete data in between 0.4 and 0.6 (originally labeled 1).