Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
528 changes: 528 additions & 0 deletions org-Polaris/astex_diverse_set/01.dataset_creation.ipynb

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions org-Polaris/astex_diverse_set/dataset_readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Background

The Astex Diverse set is a well-established and commonly-used benchmark for evaluating docking methods. It was published in 2007 is a set of hand-picked, relevant, diverse, and high-quality protein–ligand complexes from the PDB. The complexes were downloaded from the PDB as MMTF files and PyMOL was used to remove solvents and all occurrences of the ligand of interest from the complexes before saving the proteins with the cofactors in PDB files and the ligands in SDF files.

## Data source
- Reference: [Hartshorn et al.](https://pubs.acs.org/doi/abs/10.1021/jm061277y)
- Orignial: https://zenodo.org/records/8278563
- Polaris: polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/astex_diverse_set
21 changes: 21 additions & 0 deletions org-Polaris/astex_diverse_set/env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
channels:
- conda-forge

dependencies:
# scientific
- rdkit == 2023.9.6
- auroris == 0.1.4
- polaris == 0.8.5

# visualization
- umap-learn==0.5.5
- seaborn==0.13.2

# notebooks
- jupyterlab==4.2.1

# utils
- gcsfs == 2024.6.1
- openpyxl == 3.1.2


509 changes: 509 additions & 0 deletions org-Polaris/posebusters/01.dataset_creation.ipynb

Large diffs are not rendered by default.

1,117 changes: 1,117 additions & 0 deletions org-Polaris/posebusters/02.benchmark_creation.ipynb

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions org-Polaris/posebusters/benchmark_readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
![logo](https://posebusters.readthedocs.io/en/latest/_static/logo_square.png)
## Background
The PoseBusters dataset set is a new set of carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein–ligand complexes which contain drug-like molecules. It only contains complexes released since 2021 and therefore does not contain any complexes present in the PDBbind General Set v2020 used to train many of the methods.


[Buttenschoen et al.](https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc04185a) outlines the steps used to select the 308 unique proteins and 308 unique ligands in the PoseBusters dataset. The complexes were downloaded from the PDB as MMTF files, and PyMOL was used to remove solvents and all occurrences of the ligand of interest. The proteins were saved with their cofactors in PDB format, while the ligands were saved in SDF format.

## Benchmark description

This is a zero-shot benchmark that contains only a test set of 308 proteisn and ligands for evaluation.

Posebusters offers a series of ligand checkers, known as 'Posebuster Checkers,' to filter out undesired docked ligand conformers. It is recommended to apply these filters before uploading results to the Polaris Hub.

Only the extracted ligand from the docking output should be uploaded for evaluation, ensuring that the receptor (protein) coordinates have been aligned with the original crystal structure.

This benchmark uses the metric RMSD coverage (≤2Å), which calculates the percentage of molecules with an RMSD of less than 2Å compared to the reference. For more details, see the documentation on [polaris](https://github.com/polaris-hub/polaris/blob/6e402f9d58d80d0ffe0b499cfef69a0c28c0427c/polaris/evaluate/metrics/docking_metrics.py#L39). .


## Data source
- Orignial: https://zenodo.org/records/8278563
- Polaris: gs://polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/posebusters_benchmark_set

## Other links
- Paper: https://pubs.rsc.org/en/content/articlelanding/2024/sc/d3sc04185a
- Github: https://github.com/maabuu/posebusters/tree/main
13 changes: 13 additions & 0 deletions org-Polaris/posebusters/dataset_readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Background
The PoseBusters dataset set is a new set of carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein–ligand complexes which contain drug-like molecules. It only contains complexes released since 2021 and therefore does not contain any complexes present in the PDBbind General Set v2020 used to train many of the methods.


[Buttenschoen et al.](https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc04185a) lists the steps used to select the 308 unique proteins and 308 unique ligands in the PoseBusters dataset set. The complexes were downloaded from the PDB as MMTF files and PyMOL was used to remove solvents and all occurrences of the ligand of interest before saving the proteins with the cofactors in PDB files and the ligands in SDF files.


## Data source
- Orignial: https://zenodo.org/records/8278563
- Polaris: gs://polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/posebusters_benchmark_set

## Other links
- Github: https://github.com/maabuu/posebusters/tree/main
21 changes: 21 additions & 0 deletions org-Polaris/posebusters/env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
channels:
- conda-forge

dependencies:
# scientific
- rdkit == 2023.9.6
- auroris == 0.1.4
- polaris == 0.8.0

# visualization
- umap-learn==0.5.5
- seaborn==0.13.2

# notebooks
- jupyterlab==4.2.1

# utils
- gcsfs == 2024.6.1
- openpyxl == 3.1.2