#
NOTE: This repo has now been archived. 2019-11-07
BioThings Explorer integrates a variety of biomedical API resources and make them available through a central access point. By addressing the data interoperability issue across different APIs through SmartAPI and JSON-LD technologies, BioThings Explorer allows users to perform federated queries across multiple different databases as well as conduct linked biomedical knowledge discovery, e.g. find all drugs which interacts with proteins produces by genes involved in a specific biological pathway.
The main documentation of this site could be organized into the following sections.
Consider a real use case here: a biologist would like to find information about all drugs which might target a specific protein. You could find all API endpoints which can connect from gene to drug using the BioThings Explorer pyton client:
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_input_output_semantic_types(input_semantic_type='gene', output_semantic_type='chemical'))
Here is an example of the output:
[{
"endpoint": "https://pharos.nih.gov/idg/api/v1/targets({geneid})",
"predicate": "GeneOrGeneProductToChemicalAssociation",
"object": {
"semantic_type": "chemical",
"prefix": "pharos.ligand"
},
"subject": {
"semantic_type": "gene",
"prefix": "pharos.target"
}
},
{
"endpoint": "http://dgidb.genome.wustl.edu/api/v2/interactions.json?genes={genesymbol}",
"predicate": "GeneOrGeneProductToChemicalAssociation",
"object": {
"semantic_type": "chemical",
"prefix": "chembl.compound"
},
"subject": {
"semantic_type": "gene",
"prefix": "hgnc.symbol"
}
},
{
"endpoint": "http://MyChem.info/v1/querybydrugtarget",
"predicate": "GeneOrGeneProductToChemicalAssociation",
"object": {
"semantic_type": "chemical",
"prefix": "inchikey"
},
"subject": {
"semantic_type": "gene",
"prefix": "uniprot"
}
}]
Hint
For how to access all semantic types used in BioThings Explorer, please reference: :ref:`semantic_type`
You could also use BioThings Explorer to find API endpoints which takes gene as input :
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_input_semantic_type_only('gene'))
Or find all API endpoints which produce drug/chemical as output:
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_output_semantic_type_only('chemical'))
BioThings Explorer also supports finding APIs based on input/output prefixes.
For example, you could find APIs which can connect from hgnc.symbol to chembl.compound through the BioThings Explorer Python client.
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_input_output_prefix(input_prefix='hgnc.symbol', output_prefix='chembl.compound'))
Hint
For how to access all prefixes used in BioThings Explorer, please reference: :ref:`uri_prefix`
You could also use BioThings Explorer to find API endpoints which takes uniprot id as input :
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_input_prefix_only(input_prefix='uniprot'))
Or find all API endpoints which produce ncbigene id as output:
In [1]: from biothings_explorer import APILocator
In [2]: locator = APILocator()
In [3]: print(locator.locate_apis_by_output_prefix_only(input_prefix='ncbigene'))
The amount of publically available biomedical data is growing at a tremendous pace. Meanwhile, RESTful APIs has become a popular way for data providers as well as data curators to distribute their data for public access. However, all these APIs and the biological knowledge underneath them are fundamentally unconnected. By solving the interoperability issue across different APIs, BioThings Explorer manages to stitch together individual APIs and build them into a network of linked web services. The following table list current public API resources which has been integrated by BioThings Explorer.
API | URL | Description |
---|---|---|
MyGene.info | http://mygene.info | Gene Annotation Service |
MyChem.info | http://mychem.info | Drug/Chemical Annotation Service |
MyDisease.info | http://mydisease.info | Disease Annotation Service |
Reactome | https://reactome.org | Pathway Analysis Service |
DGIdb | http://dgidb.org | Drug Gene Interaction Database |
BioLink | https://api.monarchinitiative.org/api/ | Linked Biological Knowledge |
Disease Ontology | http://disease-ontology.org/ | Disease Ontology API Service |
Pharos | https://pharos.nih.gov/idg/index | Drug Gene Disease Interaction |
EBI OLS | https://www.ebi.ac.uk/ols/index | Ontology Lookup Service |
ChEMBL | https://www.ebi.ac.uk/chembl/ | Database of Bioactive Drug-Like Small Molecules |
PubChem | https://pubchem.ncbi.nlm.nih.gov/ | Open Chemistry Database |
Taxonomy | https://t.biothings.io | Taxonomy Annotation Service |
HGNC | https://www.genenames.org/ | Curated online Repo Gene Nomenclature |
Note
If you are interested in integrating your own/interested API resource in BioThings Explorer, you could visit SmartAPI Website to create a SmartAPI specification for your API. You could also place an issue at BioThings Explorer Github Repo to let us know.
To retrieve all integrated APIs in BioThings Explorer using the pyton client:
In [1]: from biothings_explorer import MetaData
In [2]: metadata = MetaData()
In [3]: print(metadata.list_all_api_resources())
Uniform Resource Identifiers (URIs) is a string of characters designed for unambiguous identification of resources. URIs are used in BioThings Explorer to uniquely identify a biological entity. For example, ncbigene ids is represented by http://identifiers.org/ncbigene/ as its unique identifier. Identifiers.org provides stable, persistent and resolvable URIs for the identification of life science data, and is chosen as the default URI repository in the implementation of BioThings Explorer. Meanwhile, there are also cases where existing URIs could not be found in existing URI repos. In this case, we will create a temporary URI for it. Prefix is essentially a short form of URI. Every URI has its corresponding prefix. Users of BioThings Explorer could use prefixes to navigate through the service.
To retrieve all prefixes used in BioThings Explorer using the pyton client:
In [1]: from biothings_explorer import MetaData
In [2]: metadata = MetaData()
In [3]: print(metadata.list_all_prefixes())
Every URI/prefix used within BioThings Explorer is associated with a specific semantic type, e.g. gene, variant, chemical, etc. By implementing this feature, users can easily find the right API to use based on the scenario.
To retrieve all prefixes used in BioThings Explorer using the pyton client:
In [1]: from biothings_explorer import MetaData
In [2]: metadata = MetaData()
In [3]: print(metadata.list_all_semantic_types())
Predicate specifies the relationship between the input and the output. Users could use predicate to filter the results from BioThings Explorer. For example, the predicate between a drug and a gene/protein could be inhibit, block, agnoize, etc..
To retrieve all predicates used in BioThings Explorer using the pyton client:
In [1]: from biothings_explorer import MetaData
In [2]: metadata = MetaData()
In [3]: print(metadata.list_all_predicates())
BioThings Explorer makes it simple for users who wants to query different databases. All you need to do is to tell BioThings Explorer what you have (e.g. hgnc.symbol: CXCR4) and what you want back (e.g. chembl.compound). BioThings Explorer will automatically help you locate the API(s) which can connect from your input to output and fetch the results for you.
The following code retrieves all chembl compound IDs which are related to gene symbol CXCR4.
In [1]: from biothings_explorer_test import fetch_output
In [2]: data = fetch_output(input_prefix='hgnc.symbol', input_value='CXCR4', output_prefix='chembl.compound')
When querying data from multiple different databases, one key obstacle is that databases tend to use different accession numbers. For example, some databases use ncbigene IDs to represent genes while other choose to use hgnc symbols. In this case, an additional ID Conversion step needs to be done in order to fetch the results.
BioThings Explorer makes it easy for users since it includes functions to automatically find synonyms for common biological IDs (e.g. gene, chemical, disease identifiers). So users no longer needs to perform the ID conversion step themselves.
The following code retrieves all chembl compound IDs which are related to gene symbol CXCR4 and also taking into account all the synonyms of chembl compound IDs.
In [1]: from biothings_explorer_test import fetch_output
In [2]: data = fetch_output(input_prefix='hgnc.symbol', input_value='CXCR4', output_prefix='chembl.compound', enable_semantic_search=True)
Note
There might be cases where synonyms could not be found. For example, disease IDs (e.g. DOID, OMIM disease ID) doesn't always have a 1-1 match.
- python >=2.6 (including python3)
- requests (install using "pip install requests")
Either install from source, like:
git clone https://github.com/biothings/biothings_explorer.git
cd biothings_explorer
python setup.py install
or use pip, like:
pip install biothings_explorer
or directly from our repository, like:
pip install git+https://github.com/biothings/biothings_explorer.git#egg=biothings_explorer
.. toctree:: :maxdepth: 2 doc/code