polaris-hub · zhu0619 · Aug 20, 2024 · Aug 27, 2024 · Sep 6, 2024 · Sep 7, 2024
diff --git a/org-Polaris/astex_diverse_set/01.dataset_creation.ipynb b/org-Polaris/astex_diverse_set/01.dataset_creation.ipynb
diff --git a/org-Polaris/astex_diverse_set/dataset_readme.md b/org-Polaris/astex_diverse_set/dataset_readme.md
@@ -0,0 +1,8 @@
+## Background
+
+The Astex Diverse set is a well-established and commonly-used benchmark for evaluating docking methods. It was published in 2007 is a set of hand-picked, relevant, diverse, and high-quality protein–ligand complexes from the PDB. The complexes were downloaded from the PDB as MMTF files and PyMOL was used to remove solvents and all occurrences of the ligand of interest from the complexes before saving the proteins with the cofactors in PDB files and the ligands in SDF files.
+
+## Data source
+- Reference: [Hartshorn  et al.](https://pubs.acs.org/doi/abs/10.1021/jm061277y)
+- Orignial: https://zenodo.org/records/8278563
+- Polaris: polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/astex_diverse_set
diff --git a/org-Polaris/astex_diverse_set/env.yml b/org-Polaris/astex_diverse_set/env.yml
@@ -0,0 +1,21 @@
+channels:
+  - conda-forge
+
+dependencies:
+  # scientific
+  - rdkit == 2023.9.6
+  - auroris == 0.1.4
+  - polaris == 0.8.5
+
+  # visualization
+  - umap-learn==0.5.5
+  - seaborn==0.13.2
+
+  # notebooks
+  - jupyterlab==4.2.1
+
+  # utils
+  - gcsfs == 2024.6.1
+  - openpyxl == 3.1.2
+
+
diff --git a/org-Polaris/posebusters/01.dataset_creation.ipynb b/org-Polaris/posebusters/01.dataset_creation.ipynb
diff --git a/org-Polaris/posebusters/02.benchmark_creation.ipynb b/org-Polaris/posebusters/02.benchmark_creation.ipynb
diff --git a/org-Polaris/posebusters/benchmark_readme.md b/org-Polaris/posebusters/benchmark_readme.md
@@ -0,0 +1,25 @@
+![logo](https://posebusters.readthedocs.io/en/latest/_static/logo_square.png)
+## Background
+The PoseBusters dataset set is a new set of carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein–ligand complexes which contain drug-like molecules. It only contains complexes released since 2021 and therefore does not contain any complexes present in the PDBbind General Set v2020 used to train many of the methods. 
+
+
+[Buttenschoen et al.](https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc04185a) outlines the steps used to select the 308 unique proteins and 308 unique ligands in the PoseBusters dataset. The complexes were downloaded from the PDB as MMTF files, and PyMOL was used to remove solvents and all occurrences of the ligand of interest. The proteins were saved with their cofactors in PDB format, while the ligands were saved in SDF format.
+
+## Benchmark description
+
+This is a zero-shot benchmark that contains only a test set of 308 proteisn and ligands for evaluation.
+
+Posebusters offers a series of ligand checkers, known as 'Posebuster Checkers,' to filter out undesired docked ligand conformers. It is recommended to apply these filters before uploading results to the Polaris Hub.
+
+Only the extracted ligand from the docking output should be uploaded for evaluation, ensuring that the receptor (protein) coordinates have been aligned with the original crystal structure.
+
+This benchmark uses the metric RMSD coverage (≤2Å), which calculates the percentage of molecules with an RMSD of less than 2Å compared to the reference. For more details, see the documentation on [polaris](https://github.com/polaris-hub/polaris/blob/6e402f9d58d80d0ffe0b499cfef69a0c28c0427c/polaris/evaluate/metrics/docking_metrics.py#L39). .
+
+
+## Data source
+- Orignial: https://zenodo.org/records/8278563
+- Polaris: gs://polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/posebusters_benchmark_set
+
+## Other links
+- Paper: https://pubs.rsc.org/en/content/articlelanding/2024/sc/d3sc04185a
+- Github: https://github.com/maabuu/posebusters/tree/main
diff --git a/org-Polaris/posebusters/dataset_readme.md b/org-Polaris/posebusters/dataset_readme.md
@@ -0,0 +1,13 @@
+## Background
+The PoseBusters dataset set is a new set of carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein–ligand complexes which contain drug-like molecules. It only contains complexes released since 2021 and therefore does not contain any complexes present in the PDBbind General Set v2020 used to train many of the methods. 
+
+
+[Buttenschoen et al.](https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc04185a) lists the steps used to select the 308 unique proteins and 308 unique ligands in the PoseBusters dataset set. The complexes were downloaded from the PDB as MMTF files and PyMOL was used to remove solvents and all occurrences of the ligand of interest before saving the proteins with the cofactors in PDB files and the ligands in SDF files. 
+
+
+## Data source
+- Orignial: https://zenodo.org/records/8278563
+- Polaris: gs://polaris-public/polaris-recipes/org-polaris/posebusters/posebusters_paper_data/posebusters_benchmark_set
+
+## Other links
+- Github: https://github.com/maabuu/posebusters/tree/main
diff --git a/org-Polaris/posebusters/env.yml b/org-Polaris/posebusters/env.yml
@@ -0,0 +1,21 @@
+channels:
+  - conda-forge
+
+dependencies:
+  # scientific
+  - rdkit == 2023.9.6
+  - auroris == 0.1.4
+  - polaris == 0.8.0
+
+  # visualization
+  - umap-learn==0.5.5
+  - seaborn==0.13.2
+
+  # notebooks
+  - jupyterlab==4.2.1
+
+  # utils
+  - gcsfs == 2024.6.1
+  - openpyxl == 3.1.2
+
+