v2 Data Sources

Release V2.0.0 Knowledge Graph Data Sources

Release: v2.0.0

Data Access: https://console.cloud.google.com/storage/browser/pheknowlator/archived_builds/release_v2.0.0

Dependencies:

Data_Preparation.ipynb documents the creation of all generated data
Ontology_Cleaning.ipynb documents all ontology cleaning and preprocessing

Rationale: The goal of this build was to create a knowledge graph that represented human disease mechanisms and included the central dogma. The data sources utilized in this release include many of the sources used in the initial release, as well as some new data made available by the Comparative Toxicogenomics Database and experimental data from the Human Protein Atlas.

Ontologies
Data Sources

ONTOLOGIES

Cell Ontology

Homepage: GitHub
Citation:

Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21

Usage: Utilized to connect transcripts and proteins to cells. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI
GO
PATO
PRO
RO
UBERON

Type	Source Column	Metadata Variable Name
*Node Metadata*
protein
	DB_Object_Symbol	dbxref
	With_Or_From	dbxref
	DB_Object_Name	synonym
	DB_Object_Synonym	synonym
go-bp go-cc go-mf
go-bp go-cc go-mf	Aspect	GOA_Aspect
*Edge Metadata*
protein-gobp protein-go-cc protein-go-mf
	Qualifier	GOA_Qualifier
	DB_Reference	GOA_DB_Reference
	EvidenceCode	GOA_EvidenceCode
	Taxon	GOA_Taxon
	AssignedBy	GOA_AssignedBy

Type	Source Column	Metadata Variable Name
*Node Metadata*
disease
	DiseaseName	synonym
	DiseaseID	dbxref
*Edge Metadata*
disease-phenotype
	Reference	HPO_Reference
	Evidence	HPO_Evidence
	Frequency	HPO_Frequency
	Sex	HPO_Sex
	Modifier	HPO_Modifier
	Aspect	HPO_Aspect
	Biocuration	HPO_Biocuration

Type	Source Column	Metadata Variable Name
*Node Metadata*
chemical
	ChemicalName	synonym
	ChemicalID	dbxref
	CasRN	dbxref
gene rna protein
	GeneSymbol	synonym
	GeneSymbol	dbxref
	OrganismID	CTD_OrganismID
*Edge Metadata*
chemical-gene chemical-rna chemical-protein
	Interaction	CTD_Interaction
	InferenceActions	CTD_InferenceActions
	PubMedIDs	CTD_PubMedIDs
	OrganismID	CTD_OrganismID

Type	Source Column	Metadata Variable Name
*Node Metadata*
gobp gocc gomf
	GOTermName	synonym
	Ontology	CTD_Ontology
*Edge Metadata*
chemical-gobp chemical-gocc chemical-gomf
	HighestGOLevel	CTD_HighestGOLevel
	Pvalue	CTD_Pvalue
	CorrectedPValue	CTD_CorrectedPValue
	TargetMatchQty	CTD_TargetMatchQty
	TargetTotalQty	CTD_TargetTotalQty
	BackgroundMatchQty	CTD_BackgroundMatchQty
	BackgroundTotalQty	CTD_BackgroundTotalQty

Type	Source Column	Metadata Variable Name
*Node Metadata*
pathway
	PathwayName	synonym
	PathwayID	dbxref

Type	Source Column	Metadata Variable Name
*Edge Metadata*
gene-gene
gene-gene	Weight	GeneMania_Weight

Type	Source Column	Metadata Variable Name
*Node Metadata*
pathway
	ReactomeID	dbxref
	Species	Reactome_Species
*Edge Metadata*
chemical-pathway
	EvidenceID	Reactome_EvidenceID
	Species	Reactome_Species

Type	Source Column	Metadata Variable Name
*Node Metadata*
pathway
	DBReference	dbxref
	TaxonID	Reactome:TaxonID
gobp gocc gomf
	GOID	dbxref
	Aspect	Reactome:Aspect
	TaxonID	Reactome:TaxonID
	*Edge Metadata*
gobp-pathway pathway-gocc pathway-gomf
	Qualifier	Reactome_Qualifier
	EvidenceCode	Reactome_EvidenceCode
	TaxonID	Reactome_TaxonID
AssignedBy	Reactome_AssignedBy

Type	Source Column	Metadata Variable Name
*Node Metadata*
protein
	protein1	dbxref
	protein2	dbxref
*Edge Metadata*
protein-protein
protein-protein	combined_score	STRING_combined_score

Type	Source Column	Metadata Variable Name
*Node Metadata*
protein
	UniProt_ID	dbxref
	UniProt_Entry_Name	dbxref
*Edge Metadata*
protein-catalyst protein-cofactor
protein-catalyst protein-cofactor	Status	Uniprot_Status

Type	Source Column	Metadata Variable Name
*Node Metadata*
disease
	diseaseName	synonym
	diseaseId	dbxref
	diseaseSemanticType	DisGeNET_diseaseSematnicType
	diseaseClass	DisGeNET_diseaseClass
phenotype
	diseaseName	synonym
	diseaseId	dbxref
	diseaseSemanticType	DisGeNET_diseaseSematnicType
	diseaseClass	DisGeNET_diseaseClass
gene
gene	geneSymbol	dbxref
*Edge Metadata*
gene-disease
	DSI	DisGeNET_DSI
	DPI	DisGeNET_DPIe
	score	DisGeNET_score
	EI	DisGeNET_EI
	YearInitial	DisGeNET_YearInitial
	YearFinal	DisGeNET_YearFinal
	NofPmids	DisGeNET_NofPmids
	NofSnps	DisGeNET_NofSnps
	source	DisGeNET_source

Type	Source Column	Metadata Variable Name
*Node Metadata*
protein
protein	UniProtIDs	dbxref
rna
rna	Ensembl_IDs	dbxref
anatomy/cell
anatomy/cell	Anatomy	GTEx_Anatomy
*Edge Metadata*
protein-anatomy/cell rna-anatomy/cell
	Expression_Value	GTEx_Expression_Value
	Subcellular_Location	GTEx_Subcellular_Location

Type	Source Column	Metadata Variable Name
*Node Metadata*
protein
protein	UniProtIDs	dbxref
rna
rna	Ensembl_IDs	dbxref
anatomy/cell
anatomy/cell	Anatomy	HPA_Anatomy
*Edge Metadata*
protein-anatomy/cell rna-anatomy/cell
	Expression_Value	HPA_Expression_Value
	Subcellular_Location	HPA_Subcellular_Location

v2 Data Sources

Release V2.0.0 Knowledge Graph Data Sources

ONTOLOGIES

Cell Ontology

Cell Line Ontology

ChEBI Ontology - Lite

Gene Ontology

Human Phenotype Ontology

Mondo Disease Ontology

Pathway Ontology

Protein Ontology

Relations Ontology

Sequence Ontology

Uber-Anatomy Ontology

Vaccine Ontology

DATA SOURCES

BioPortal

ClinVar

Comparative Toxicogenomics Database

DisGeNET

Ensembl

GeneMANIA

The Genotype-Tissue Expression (GTEx) Project

HUGO Gene Nomenclature Committee

Human Protein Atlas

NCBI Gene

Reactome Pathway Database

STRING Database

UniProt Knowledgebase

Project Information

Tutorials and Use Cases

Releases

Human Disease KG Builds

FAQs

Enabling Reproducible Research

Clone this wiki locally