- SMILES notations of 10,000 de novo generated molecules from DrugGEN model can be downloaded from here. (In addition to this, the SMILES notations of 10,000 de novo generated molecules from the DrugGEN-NoTarget model here).
- We run our deep learning-based drug/compound-target protein interaction prediction system (DEEPScreen) on generated molecules from DrugGEN model. DEEPScreen predicted 5,700 of them as active against AKT1, 130 of which received the highest confidence score (SMILES notations of DEEPScreen predicted actives).
- At the same time, we conducted a molecular docking analysis on the de novo molecules generated from DrugGEN and other target-based generation models, including RELATION, TRIOMHPHE-BOA, ResGen, as well as on real AKT1 inhibitors, using the crystal structure of AKT1. A total of 1,600 molecules exhibited sufficiently low binding free energies (< -8 kcal/mol) for the DrugGEN model. The corresponding molecules can be found here.
- Parallel to this, we applied filtering to 10,000 de novo generated molecules from the DrugGEN model using Lipinski, Veber, and PAINS filters. After this operation, 4,127 of them successfully passed the filters, and their SMILES notations can be found here.
- Finally, de novo molecules to effectively target the AKT1 protein are selected via expert curation from the dataset of molecules with binding free energies lower than -8 kcal/mol and predicted as active by DEEPScreen against the AKT1 protein (SMILES notations of the expert selected de novo AKT1 inhibitor molecules).
Glide (Schrödinger Suite) was used to perform docking of AKT1 inhibitors, randomly sampled 10K ChEMBL molecules and DrugGEN generated molecules, using AKT1 crystal structure (4GV1) as a reference protein. The top 1,000 docking scores for each set are available here. Also, the docking results of the crystal structure and selected de novo molecule (MOL_01_027820) were visualized using PyMOL and saved as PDB files.
The simulation analyses were conducted for AKT1-Capivasertib complex (crystal structure: 4GV1) and AKT1-MOL_02_027820 complex (consisting of the 4GV1 protein and a de novo generated molecule) using the Simulation Interactions Diagram module integrated into Maestro (Desmond (Schrödinger Suite)). MD files for the AKT1-Capivasertib complex and AKT1-MOL_02_027820 complex have been shared on Google Drive.
This script takes three arguments:
gen_smiles
: A list of SMILES strings representing the de novo generated molecules. Molecules should be found under a column named "SMILES".ref_smiles_1
: A list of SMILES strings representing the reference molecules for novelty calculation. (e.g. ChEMBL molecules)ref_smiles_2
: A list of SMILES strings representing the reference molecules for novelty calculation. (e.g. selected inhibitors)
The script calculates the following metrics:
- Validity: The fraction of valid molecules in the generated set.
- Uniqueness: The fraction of unique molecules in the generated set.
- Novelty: The fraction of molecules in the generated set that are not present in the reference sets.
- Internal Diversity: The average Tanimoto similarity between all pairs of molecules in the generated set.
- QED: The average QED score of the molecules in the generated set.
- SA: The average SA score of the molecules in the generated set.
- FCD: The average FCD score of the molecules in the generated set against both reference sets.
- Fragment Similarity: The average fragment similarity score of the molecules in the generated set against both reference sets.
- Scaffold Similarity: The average scaffold similarity score of the molecules in the generated set against both reference sets.
- Lipinski: The fraction of molecules in the generated set that pass the Lipinski filter.
- Veber: The fraction of molecules in the generated set that pass the Veber filter.
- PAINS: The fraction of molecules in the generated set that pass the PAINS filter.