Improved MCS and Regression to Classification
MCS
The maximum common substructure (MCS) similarity function in molecular_similarities.py
now accepts additional parameters for modifying the MCS calculation. Specifically, it now allows for match_valences
, ring_matches_ring_only
, and complete_rings_only
(see https://www.rdkit.org/docs/source/rdkit.Chem.MCS.html). These are also accessible via the command line when running chemfunc nearest_neighbor
.
Regression to Classification
The regression_to_classification.py
script now includes a delete_class_indices
flag to delete certain class indices. The primary use case is for building binary classification datasets with a gap between the active and inactive categories. For example, setting thresholds = [0.4, 0.6]
and delete_class_indices = {1}
will label data < 0.4 as 0 and data >= 0.6 as 1 (originally labeled 2) and will delete data in between 0.4 and 0.6 (originally labeled 1).