Python implementation of the main clustering results for the paper entitled: On the best way to cluster NCI-60 molecules
- Python 3.9.7
- venv
- Create the virtual environment
python -m venv .venv
- Activate the virtual environment
- On Windows, run:
.venv\Scripts\activate.bat
- On Linux or MacOs, run:
.source .venv/bin/activate
- Install all the necessary packages
python -m pip install -r requirements.txt
-
Download and store the from the NCI-60 Growth Inhibition Data.
- The required files contain the endpoints calculated from concentration curves ("CANCER60GI50_Oct2020.LST", for instance) and SMILES ("Chem2D_Jun2016.smi", for instance). Other releases of both files are also available for download.
-
In the file main.py, replace the following lines.
- Replace with the directory where the downloaded data is stored
dir_working = '/home/hernadez/Documents/NCI60_data/'
- Replace only if the files are different from those suggested in 1.
file_nci60 = 'CANCER60GI50_Oct2020.LST' file_smiles = 'Chem2D_Jun2016.smi'
-
By default, the number of clusters (k) has been set to 7, and the removal of outliers has been requested. Both values can be modified in the following lines.
k = 7 # number of clusters
outliers = True # to remove outliers
To run:
python main.py
Output: As a result, a folder containing the clustering assignment ([NSC, SMILES, Cluster ID]) and the corresponding clustering quality metrics will be created.
- Saiveth Hernández-Hernández, email: saiveth.hernadez@inserm.fr
- Pedro J.Ballester, email: p.ballester@imperial.ac.uk