This repo contains all Structure-Based Virtual Screening source codes and descriptions.
Graph-based algorithm is implemented for docking SBVS problem.
- Docking is performed via 3rd party software (e.g. VINA).
- Ligand atoms and selected protein atoms formed a graph.
- selection is based on euclidean distance with an arbitrary cutoff distance.
- Atom embedding is performed for HAG-Net.
- randomly initiated embedding is utilized.
- A Graph-Based classification network based on HAG-Net is utilized for this classification problem.
- TODO:
- Atom pretraining
NoDecoy: An in-house dataset excluding any decoy samples
Dataset | Datasize | Positive Sample Number | Positive Sample Ratio |
---|---|---|---|
NoDecoy | 309581 | 245890 | 79.43% |
All models are trained with a random 5-Fold cross validation, and mean results are listed.
Model | Dataset | AUC | EF @ 2% | EF @ 20% | Accuracy |
---|---|---|---|---|---|
HAG-Net | NoDecoy | 0.89 | 1.26 | 1.25 | 85.80% |
- EF: Enrichment Factor
Non-docking models are implemented for SBVS problem. Ligand is modeled by graph-based model(HAG-Net), while protein/pocket is modeled by both sequence-based and graph-based models.
- NonDockingGG: graph-based(pocket) + graph-based(ligand)
- NonDockingSG: sequence-based(pocket) + graph-based(ligand)
- Representations of ligand and protein pocket are provided by two different neural network.
- Interaction between two representations are modeled by a MLP classification network.
- Protein/Pocket sequences are modeled using amino acids as basic units.
- Sequence pretraining is performaned with utilizing LM Mask task.
NoDecoy: An in-house dataset excluding any decoy samples
Dataset | Datasize | Positive Sample Number | Positive Sample Ratio |
---|---|---|---|
NoDecoy | 284792 | 222687 | 78.19% |
All models are trained with a random 5-Fold cross validation, and mean results are listed.
Model | Dataset | AUC | EF @ 2% | EF @ 20% | Accuracy |
---|---|---|---|---|---|
NonDockingGG | NoDecoy | 0.937 | 1.279 | 1.277 | 89.0% |
NonDockingSG | NoDecoy | 0.923 | 1.279 | 1.276 | 87.6% |
- EF: Enrichment Factor
- Models are trained with 50 epoches, further training MIGHT enhance performance.