diff --git a/docs/databases_klifs.rst b/docs/databases_klifs.rst index 1b262760..229adaf6 100644 --- a/docs/databases_klifs.rst +++ b/docs/databases_klifs.rst @@ -1,36 +1,39 @@ -Databases: KLIFS +OpenCADD-KLIFS ================ -Once you have installed the package, you will have access (among others) -to the ``opencadd.databases.klifs`` module. +Once you have installed the ``opencadd`` package, you will have access (among others) +to the ``opencadd.databases.klifs`` module (OpenCADD-KLIFS). +In case you wish to install only the dependencies relevant to OpenCADD-KLIFS, please follow the installation instructions `here `_. -This module offers a simple API to interact with data from KLIFS remotely and locally. +OpenCADD-KLIFS offers a simple API to interact with data from KLIFS remotely and locally. +Find a detailed tutorial at the `TeachOpenCADD platform `_ on the KLIFS database and on how to apply the module OpenCADD-KLIFS to an example research question. What is KLIFS and who created it? --------------------------------- -"KLIFS is a kinase database that dissects experimental structures of catalytic kinase domains and the way kinase inhibitors interact with them. The KLIFS structural alignment enables the comparison of all structures and ligands to each other. Moreover, the KLIFS residue numbering scheme capturing the catalytic cleft with 85 residues enables the comparison of the interaction patterns of kinase-inhibitors, for example, to identify crucial interactions determining kinase-inhibitor selectivity." + KLIFS is a kinase database that dissects experimental structures of catalytic kinase domains and the way kinase inhibitors interact with them. The KLIFS structural alignment enables the comparison of all structures and ligands to each other. Moreover, the KLIFS residue numbering scheme capturing the catalytic cleft with 85 residues enables the comparison of the interaction patterns of kinase-inhibitors, for example, to identify crucial interactions determining kinase-inhibitor selectivity. -- KLIFS database: https://klifs.net -- KLIFS online service: https://klifs.net/swagger +- KLIFS database: https://klifs.net (official), https://dev.klifs.net/ (developmental) +- KLIFS online service: https://klifs.net/swagger (official), https://dev.klifs.net/swagger_v2 (developmental, used here) - KLIFS citation: `Nucleic Acids Res. (2021), 49, D1, D562–D569 `_ What does ``opencadd.databases.klifs`` offer? --------------------------------------------- -This module allows you to access KLIFS data such as information about kinases, structures, ligands, interaction fingerprints, bioactivities. -On the one hand, you can query the KLIFS webserver directly. +This module allows you to access KLIFS data such as information about +kinases, structures, structural conformations, modified residues, ligands, drugs, interaction fingerprints, and bioactivities. +On the one hand, you can query the KLIFS webserver directly. On the other hand, you can query your local KLIFS download. -We provide identical APIs for the remote and local queries and streamline all output into standardized ``pandas`` DataFrames for easy and quick downstream manipulation. +We provide identical APIs for the remote and local queries and streamline all output into standardized ``pandas`` DataFrames for easy and quick downstream data analyses. Work with KLIFS data from KLIFS server (remotely) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``opencadd.databases.klifs.remote`` submodule offers you to access KLIFS data from the KLIFS server. -Our API relies on the REST API and OpenAPI (Swagger) specification at https://dev.klifs.net/swagger_v2/ to dynamically generate a Python client with ``bravado``. +Our API relies on the REST API and OpenAPI (formerly Swagger API) specification at https://dev.klifs.net/swagger_v2/ to dynamically generate a Python client with ``bravado``. Example for ``opencadd``'s API to access remote data: @@ -39,13 +42,13 @@ Example for ``opencadd``'s API to access remote data: from opencadd.databases.klifs import setup_remote # Set up remote session - remote = setup_remote() + session = setup_remote() # Get all kinases that are available remotely - remote.kinases.all_kinases() + session.kinases.all_kinases() # Get kinases by kinase name - remote.kinases.by_kinase_name(["EGFR", "BRAF"]) + session.kinases.by_kinase_name(["EGFR", "BRAF"]) Work with KLIFS data from disc (locally) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -76,19 +79,19 @@ Example for ``opencadd``'s API to access local data: from opencadd.databases.klifs import setup_local # Set up local session - local = setup_local("../../opencadd/tests/databases/data/KLIFS_download") + session = setup_local("../../opencadd/tests/databases/data/KLIFS_download") # Get all kinases that are available locally - local.kinases.all_kinases() + session.kinases.all_kinases() # Get kinases by kinase name - local.kinases.by_kinase_name(["EGFR", "BRAF"]) + session.kinases.by_kinase_name(["EGFR", "BRAF"]) How is ``opencadd.databases.klifs`` structured? ---------------------------------------------------------- -The module's structure looks like this, trying to use the same API for both modules ``local`` and ``remote`` whenever possible: +The module's structure looks like this, using the same API for both modules ``local`` and ``remote`` whenever possible: .. code-block:: console @@ -105,51 +108,51 @@ The module's structure looks like this, trying to use the same API for both modu ├── utils.py # Defines utility functions. └── exceptions.py # Defines exceptions. -This structure mirrors the KLIFS Swagger API structure in the following way to access different kinds of information both remotely and locally: +This structure mirrors the KLIFS OpenAPI structure in the following way to access different kinds of information both remotely and locally: - ``kinases`` - Get information about kinases (groups, families, names). - - In KLIFS swagger API called ``Information``: https://dev.klifs.net/swagger_v2/#/Information + - In KLIFS OpenAPI called ``Information``: https://dev.klifs.net/swagger_v2/#/Information - ``ligands`` - Get ligand information. - - In KLIFS swagger API called ``Ligands``: https://dev.klifs.net/swagger_v2/#/Ligands + - In KLIFS OpenAPI called ``Ligands``: https://dev.klifs.net/swagger_v2/#/Ligands - ``structures`` - Get structure information. - - In KLIFS swagger API called ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures + - In KLIFS OpenAPI called ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures - ``bioactivities`` - Get bioactivity information. - - In KLIFS swagger API part of ``Ligands``: https://dev.klifs.net/swagger_v2/#/Ligands + - In KLIFS OpenAPI part of ``Ligands``: https://dev.klifs.net/swagger_v2/#/Ligands - ``interactions`` - Get interaction information. - - In KLIFS swagger API called ``Interactions``: https://dev.klifs.net/swagger_v2/#/Interactions + - In KLIFS OpenAPI called ``Interactions``: https://dev.klifs.net/swagger_v2/#/Interactions - ``pocket`` - Get interaction information. - - In KLIFS swagger API part of ``Interactions``: https://dev.klifs.net/swagger_v2/#/Interactions + - In KLIFS OpenAPI part of ``Interactions``: https://dev.klifs.net/swagger_v2/#/Interactions/get_interactions_match_residues - ``coordinates`` - Get structural data (structure coordinates). - - In KLIFS swagger API part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures + - In KLIFS OpenAPI part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures - ``conformations`` - Get information on structure conformations. - - In KLIFS swagger API part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures/get_structure_conformation + - In KLIFS OpenAPI part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures/get_structure_conformation - ``modified_residues`` - Get information on residue modifications in structures. - - In KLIFS swagger API part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures/get_structure_modified_residues + - In KLIFS OpenAPI part of ``Structures``: https://dev.klifs.net/swagger_v2/#/Structures/get_structure_modified_residues diff --git a/docs/index.rst b/docs/index.rst index 2dd89243..adf4f8d2 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -35,17 +35,18 @@ OpenCADD is a Python package for structural cheminformatics! :caption: User guide installing + installing_opencadd_klifs .. toctree:: :maxdepth: 1 - :caption: Input/output formats + :caption: IO formats io tutorials/io .. toctree:: :maxdepth: 1 - :caption: Structure: Superposition + :caption: OpenCADD-superposition superposition tutorials/mda @@ -54,14 +55,14 @@ OpenCADD is a Python package for structural cheminformatics! .. toctree:: :maxdepth: 1 - :caption: Structure: Pocket + :caption: OpenCADD-pocket structure_pocket tutorials/structure_pocket .. toctree:: :maxdepth: 1 - :caption: Databases: KLIFS + :caption: OpenCADD-KLIFS databases_klifs tutorials/databases_klifs diff --git a/docs/installing_opencadd_klifs.rst b/docs/installing_opencadd_klifs.rst new file mode 100644 index 00000000..379a291d --- /dev/null +++ b/docs/installing_opencadd_klifs.rst @@ -0,0 +1,29 @@ +Installing OpenCADD-KLIFS only +============================== + +In case you would like to install the dependencies for the OpenCADD-KLIFS module only, please follow these instructions. + +.. note:: + + We are assuming you have a working ``mamba`` installation in your computer. + If this is not the case, please refer to their `official documentation `_. + + +Install from the conda package +------------------------------ + +1. Create a new conda environment called ``opencadd-klifs`` with the ``opencadd`` package and all its dependencies installed:: + + mamba create -n opencadd-klifs bravado pandas tqdm rdkit biopandas + +2. Activate the new conda environment:: + + conda activate opencadd-klifs + +3. Install ``opencadd`` without any dependencies (all ``opencadd-klifs`` relevant dependencies have been installed in step 1):: + + mamba install opencadd --no-deps + + If you are planning on working with Jupyter notebooks, install JupyterLab and IPyWidgets:: + + mamba install jupyterlab ipywidgets diff --git a/opencadd/databases/klifs/core.py b/opencadd/databases/klifs/core.py index bd61ac30..37378f69 100644 --- a/opencadd/databases/klifs/core.py +++ b/opencadd/databases/klifs/core.py @@ -1443,7 +1443,7 @@ class DrugsProvider(BaseProvider): """ Class for drugs requests. - From the KLIFS Swagger API: + From the KLIFS OpenAPI: https://dev.klifs.net/swagger_v2/#/Ligands/get_drug_list > The drug list endpoint returns a list of all annotated kinase ligands that are either > approved or are/have been in clinical trials. diff --git a/opencadd/tests/databases/test_sync_klifs_and_opencadd.py b/opencadd/tests/databases/test_sync_klifs_and_opencadd.py index ec359b58..b2d47830 100644 --- a/opencadd/tests/databases/test_sync_klifs_and_opencadd.py +++ b/opencadd/tests/databases/test_sync_klifs_and_opencadd.py @@ -9,7 +9,7 @@ class TestSyncKlifsSwaggerWithOpencadd: """ - Test if opencadd is up-to-date with the KLIFS Swagger API (remote!). + Test if opencadd is up-to-date with the KLIFS OpenAPI (remote!). """ def _test_klifs_model(self, data_opencadd, data_klifs): diff --git a/papers/opencadd-klifs/opencadd_klifs_toc.png b/papers/opencadd-klifs/opencadd_klifs_toc.png new file mode 100644 index 00000000..78573454 Binary files /dev/null and b/papers/opencadd-klifs/opencadd_klifs_toc.png differ diff --git a/papers/opencadd-klifs/paper.bib b/papers/opencadd-klifs/paper.bib new file mode 100644 index 00000000..d1ea7459 --- /dev/null +++ b/papers/opencadd-klifs/paper.bib @@ -0,0 +1,152 @@ +@article{Cohen:2021, + title={Kinase drug discovery 20 years after imatinib: progress and future directions}, + author={Cohen, Philip and Cross, Darren and J{\"a}nne, Pasi A.}, + journal={Nature Reviews Drug Discovery}, + volume={20}, + number={7}, + pages={551-569}, + year={2021}, + doi={10.1038/s41573-021-00195-4}, +} + +@article{Kooistra:2017, + author = {Kooistra, Albert J. and Volkamer, Andrea}, + title = {{Kinase-Centric Computational Drug Development}}, + journal = {Annu. Rep. Med. Chem.}, + volume = {50}, + pages = {197--236}, + year = {2017}, + doi = {10.1016/BS.ARMC.2017.08.001}, +} + +@article{Kanev:2021, + title = "{KLIFS: an overhaul after the first 5 years of supporting kinase research}", + author = {Kanev, Georgi K and {de Graaf}, Chris and Westerman, Bart A and {de Esch}, Iwan J P and Kooistra, Albert J}, + journal = {Nucleic Acids Research}, + volume = {49}, + number = {D1}, + pages = {D562-D569}, + year = {2020}, + doi = {10.1093/nar/gkaa895}, +} + +@article{vanLinden:2014, + author={van Linden, Oscar P. J. and Kooistra, Albert J. and Leurs, Rob and de Esch, Iwan J. P. and de Graaf, Chris}, + title={KLIFS: A Knowledge-Based Structural Database To Navigate Kinase--Ligand Interaction Space}, + journal={Journal of Medicinal Chemistry}, + volume={57}, + number={2}, + pages={249-277}, + year={2014}, + doi={10.1021/jm400378w}, +} + +@article{Raschka:2017, + title = {BioPandas: Working with molecular structures in pandas DataFrames}, + author = {Sebastian Raschka}, + journal = {The Journal of Open Source Software}, + volume = {2}, + number = {14}, + year = {2017}, + doi = {10.21105/joss.00279}, +} + +@inproceedings{Kluyver:2016, + booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas}, + editor = {Fernando Loizides and Birgit Scmidt}, + title = {Jupyter Notebooks - a publishing format for reproducible computational workflows}, + author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team}, + publisher = {IOS Press}, + year = {2016}, + pages = {87--90}, + url = {https://eprints.soton.ac.uk/403913/}, +} + +@misc{klifsswagger, + author = {KLIFS}, + title = {{KLIFS OpenAPI}}, + year = 2021, + publisher = {https://dev.klifs.net}, + url = {https://dev.klifs.net/swagger_v2/}, +} + +@misc{bravado, + author = {bravado}, + title = {{bravado}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/Yelp/bravado}, +} + +@misc{pandas, + author = {{The pandas development team}}, + title = {pandas-dev/pandas: Pandas}, + year = 2020, + publisher = {Zenodo}, + journal = {Zenodo repository}, + doi = {10.5281/zenodo.3509134}, +} + +@misc{rdkit, + author = {RDKit}, + title = {{RDKit: Open-Source Cheminformatics}}, + year = 2021, + publisher = {RDKit}, + journal = {RDKit website}, + url = {http://www.rdkit.org}, +} + +@misc{kissim, + author = {{KiSSim}}, + title = {{KiSSim: Subpocket-based fingerprint for kinase pocket comparison}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/volkamerlab/kissim}, +} + +@misc{teachopencadd, + author = {{TeachOpenCADD}}, + title = {{TeachOpenCADD: a teaching platform for computer-aided drug design (CADD) using open source packages and data}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/volkamerlab/teachopencadd}, +} + +@misc{opencadd_pocket, + author = {{OpenCADD}}, + title = {{OpenCADD-Pocket: Identification and analysis of protein (sub)pockets}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/volkamerlab/opencadd}, +} + +@misc{kinoml, + author = {{OpenKinome}}, + title = {{KinoML: Structure-informed machine learning for kinase modeling}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/openkinome/kinoml}, +} + +@misc{plipify, + author = {{PLIPify}}, + title = {{PLIPify: Protein-ligand interaction frequencies across multiple structures}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/volkamerlab/plipify}, +} + +@misc{volkamerlab, + author = {{Volkamer Lab}}, + title = {{Volkamer Lab website}}, + year = 2021, + publisher = {Volkamer Lab}, + journal = {Volkamer Lab website}, + url = {https://volkamerlab.org/}, +} diff --git a/papers/opencadd-klifs/paper.md b/papers/opencadd-klifs/paper.md new file mode 100644 index 00000000..250c1686 --- /dev/null +++ b/papers/opencadd-klifs/paper.md @@ -0,0 +1,61 @@ +--- +title: 'OpenCADD-KLIFS: A Python package to fetch kinase data from the KLIFS database' +tags: + - Python + - KLIFS + - kinase +authors: + - name: Dominique Sydow^[corresponding author] + orcid: 0000-0003-4205-8705 + affiliation: 1 + - name: Jaime Rodríguez-Guerra + orcid: 0000-0001-8974-1566 + affiliation: 1 + - name: Andrea Volkamer + affiliation: 1 + orcid: 0000-0002-3760-580X +affiliations: + - name: _In Silico_ Toxicology and Structural Bioinformatics, Institute of Physiology, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany + index: 1 +date: 27 October 2021 +bibliography: paper.bib +--- + +# Summary + +Protein kinases are involved in most aspects of cell life due to their role in signal transduction. Dysregulated kinases can cause severe diseases such as cancer, inflammatory and neurodegenerative diseases, which has made them a frequent target in drug discovery for the last decades [@Cohen:2021]. +The immense research on kinases has led to an increasing amount of kinase resources [@Kooistra:2017]. +Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting drugs and other small molecules [@Kanev:2021]. +The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. + +# Statement of need + +[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. +This module offers access to KLIFS data [@Kanev:2021] such as information about kinases, structures, ligands, +interaction fingerprints, and bioactivities. +KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment (MSA) [@vanLinden:2014]. +With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely from the KLIFS webserver. +The presented module provides identical APIs for the remote and local queries for KLIFS data and streamlines all output into +standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal to work with in Jupyter notebooks [@Kluyver:2016]. + + +![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) + +The KLIFS database offers a REST API compliant with the OpenAPI specification [@klifsswagger]. Our module OpenCADD-KLIFS uses bravado [@bravado] to dynamically generate a Python client based on the OpenAPI definitions and adds wrappers to enable the following functionalities: + +- A session is set up, which allows access to various KLIFS *data sources* by different *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently include kinases, structures and annotated conformations, modified residues, pockets, ligands, drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. For example, ``session.structures.by_kinase_name`` fetches information on all structures for a query kinase. +- The same API is used for local and remote sessions. +- The returned data follows the same schema regardless of the session type (local/remote); all results obtained with bravado are formatted as Pandas DataFrames with standardized column names, data types, and handling of missing data. +- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded via biopandas [Raschka:2017] or RDKit [@rdkit]. + +OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation" or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps and DataFrame merges in case multiple KLIFS datasets need to be combined. +OpenCADD-KLIFS is currently used in several projects from the Volkamer Lab [@volkamerlab] including TeachOpenCADD [@teachopencadd], OpenCADD-pocket [@opencadd_pocket], KiSSim [@kissim], KinoML [@kinoml], and PLIPify [@plipify]. +For example, OpenCADD-KLIFS is applied in a [TeachOpenCADD tutorial](https://projects.volkamerlab.org/teachopencadd/talktorials/T012_query_klifs.html) to demonstrate how to fetch all kinase-ligand interaction profiles for all available EGFR kinase structures to visualize the per-residue interaction types and frequencies with only a few lines of code. + +# Acknowledgements + +We thank the whole KLIFS team for providing such a great kinase resource with an easy-to-use API and especially Albert Kooistra for his help with questions and wishes regarding the KLIFS database. +We thank David Schaller for his feedback on the OpenCADD-KLIFS module. +We acknowledge the contributors involved in software programs and packages used by OpenCADD-KLIFS, such as bravado, RDKit, Pandas, Jupyter, and Pytest, and Sphinx. + +# References \ No newline at end of file diff --git a/paper.md b/papers/opencadd-superposition/paper-superposer.md similarity index 100% rename from paper.md rename to papers/opencadd-superposition/paper-superposer.md