-
Notifications
You must be signed in to change notification settings - Fork 0
glinks
G-Links is a rapid data "broker" service that collects and adds related information to a given gene (or gene set).
With the availability of numerous curated databases, researchers are nowadays able to efficiently utilize the multitude of biological data by integrating these resources by hyperlinks and cross references. A large proportion of bioinformatics research tasks, however, is comprised of labor-intensive tasks in fetching, parsing, and merging of these datasets and functional annotations from dispersed databases and web-based services. Therefore, data integration is one of the key challenges of bioinformatics. We here present G-Links, a gateway server for querying and retrieving gene annotation data. The system supports rapid querying with numerous gene IDs from multiple databases or nucleotide/amino acid sequences, by internally centralizing gene annotations based on UniProt entries. This system therefore first converts the query into UniProt ID by ID conversion or by sequence similarity search, and returns related annotations and cross references. Moreover, users are able to run external web-based tools based on the query gene. G-Links is implemented as a RESTful service, so users can easily access this tool from any web browser. This service and documentations are freely available at https://link.g-language.org/.
- Tutorial document (English) is available from here. (PDF, 10 pages)
- Tutorial document (Japanese) is available from here. (PDF, 12 pages)
This section describes the G-Links URI syntax conventions: for usage examples, scroll below. G-Link is provided by REST interface. Database cross-references information related as given gene ID or sequence (nucleotide or amino acid) can be accessed through HTTP GET/POST request using unique URI.
URL Syntax of G-Links. G-Links is implemented as a RESTful service that can be queried by altering the URL.
HTML output example of BRCA1_HUMAN (UniProt ID of BRCA1 gene in humans). By default, access to G-Links with web browsers display the results in interactive HTML, with related image gallery implemented with CoverFlow (https://imageflow.finnrudolph.de/) on the top, followed by a large table of annotations and cross-references.
- GENE
- /format=[FORMAT]
- tsv (Tabular)
- nt (Notation3)
- rdf (RDF)
- html (HTML)
- json (JSON)
- slim (Tabular format without URL (URI)
- /filter=[FILTER] (FILTER="DB_NAME:keyrowd" or "DB_NAME" or ":keyword")
- Filtering genes by database name or keywords.
- https://link.g-language.org/NC_000913/filter=GeneID (Genes which has GeneID entry)
- https://link.g-language.org/NC_000913/filter=:transport (Genes which are relate with "transport")
- https://link.g-language.org/NC_000913/filter=GO_process:transport (Genes which has GO_process entries which are relate with transport)
- This option is available with multiple-filter ("AND" filtering). Separater is "|".
- Filtering genes by database name or keywords.
- /extract=[EXTRACTE]
- Extract report items by DB or column name
- https://link.g-language.org/hsa:128/extract=Pfam (convert KEGG Gene ID to Pram ID)
- https://link.g-language.org/NC_000913/extract=GO_process (report only GO_process)
- This option is available with multiple-filter ("OR" filtering). Separater is "|".
- https://link.g-language.org/9606/filter=DISEASE|KEGG_Disease (report only DISEASE section and KEGG_Disease)
- Extract report items by DB or column name
- /evalue=[E-VALUE THRESHOLD]
- default: /evalue=1e-70
- E-value threshold for similarity search by BLAT against Swiss-Prot
- This option is valid only user given sequence data to //GENE//
- /identity=[IDENTITY THRESHOLD]
- default: /identity=0.98
- Identity threshold for similarity search by BLAT against Swiss-Prot
- This option is valid only user given sequence data to //GENE//)
- /direct=0
- if "/direct=1", this service shows related information about top-hit Uniprot ID (feeling lucky)
Overview of supported databases and web services in G-Links:
Detailed list is as follows:
-
https://link.g-language.org/GeneID:947170
- Related information to GeneID:947170 as tabular format.
-
https://link.g-language.org/eco:b2699/format=nt/extract=GOslim
- GO slim about eco:b2699 (KEGG Gene) as N-Triple format.
-
https://link.g-language.org/MMQESATETISNSSMNQNGMSTLSSQLDAGSRDGRSSG...
- Information table about UniProt IDs which is reported by BLAT search about given AminoSeq against Swiss-Prot.
-
https://link.g-language.org/9606/format=text/filter=:cancer|KEGG_Disease/extract=DISEASE|KEGG_Disease
- DISEASE information which is gene sets related to cancer
-
https://gist.github.com/cory-ko/4753253
- Perl script to get related information to top-hit UniProt ID by BLAT search against Swiss-Prot.
-
https://gist.github.com/cory-ko/4753374
- Ruby script to get "DISEASE" and SNP (dbSNP & SNPedia) info about H.sapiens "cancer" genes which have "GOslim component" annotation related to "metabolic" in slimed TSV format.
-
https://gist.github.com/cory-ko/4753373
- Python script to get related information about hsa:128 and RECA_ECOLI in Notation-3 format.
-
https://gist.github.com/cory-ko/4755115
- Java script to get related information about hsa:128 and RECA_ECOLI in TSV format.
One of the strength of G-Links is its programmatic access. For example, GO slim classification of all genes of E.coli for GO:Process ontology can be retrieved from the following URL:
- https://link.g-language.org/NC_000913/extract=GOslim_process This result is shown as a formatted HTML page when viewed in a browser, but when it is accessed from the command line or from programs, the result is automatically returned as TSV file. Using this, simple combination of UNIX commands can produce a classification summary of all genes in E.coli with GOslim:Process terms. Here is an example:
$ curl -v https://link.g-language.org/NC_000913/extract=GOslim_process |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn
Here, G-Links is accessed from the command line, producing the result to standard output via "curl -v", and the sections containing GO terms and its descriptions are extracted ("|grep # |cut -f 2,3 |grep GO:). Then, the terms are sorted and counted ("|sort |uniq -c"), and printed in a descending order ("|sort -rn").
Following is the output of the above line of commands:
1056 GO:0009058 biosynthetic process
1032 GO:0008150 biological_process
860 GO:0034641 cellular nitrogen compound metabolic process
636 GO:0044281 small molecule metabolic process
526 GO:0006810 transport
484 GO:0006950 response to stress
381 GO:0005975 carbohydrate metabolic process
374 GO:0009056 catabolic process
285 GO:0055085 transmembrane transport
273 GO:0006259 DNA metabolic process
257 GO:0006520 cellular amino acid metabolic process
190 GO:0051186 cofactor metabolic process
169 GO:0006629 lipid metabolic process
127 GO:0006464 cellular protein modification process
127 GO:0006091 generation of precursor metabolites and energy
98 GO:0006790 sulfur compound metabolic process
96 GO:0042592 homeostatic process
92 GO:0032196 transposition
84 GO:0006399 tRNA metabolic process
79 GO:0007165 signal transduction
76 GO:0071554 cell wall organization or biogenesis
72 GO:0022607 cellular component assembly
63 GO:0006412 translation
52 GO:0034655 nucleobase-containing compound catabolic process
50 GO:0051301 cell division
50 GO:0007155 cell adhesion
50 GO:0006457 protein folding
45 GO:0048870 cell motility
43 GO:0006461 protein complex assembly
39 GO:0007049 cell cycle
37 GO:0040011 locomotion
32 GO:0051604 protein maturation
31 GO:0071941 nitrogen cycle metabolic process
31 GO:0051276 chromosome organization
21 GO:0061024 membrane organization
19 GO:0019748 secondary metabolic process
18 GO:0044403 symbiosis, encompassing mutualism through parasitism
17 GO:0000003 reproduction
15 GO:0022618 ribonucleoprotein complex assembly
14 GO:0007059 chromosome segregation
14 GO:0002376 immune system process
9 GO:0006605 protein targeting
7 GO:0008219 cell death
5 GO:0042254 ribosome biogenesis
4 GO:0006397 mRNA processing
3 GO:0000902 cell morphogenesis
2 GO:0048646 anatomical structure formation involved in morphogenesis
2 GO:0030198 extracellular matrix organization
2 GO:0030154 cell differentiation
1 GO:0065003 macromolecular complex assembly
1 GO:0015979 photosynthesis
1 GO:0007267 cell-cell signaling
1 GO:0007010 cytoskeleton organization
If you have a specific set of genes, such as RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI, that may be over represented in a microarray experiment, running the same routine with this list of genes can produce the Gene Ontology classification of these genes of interest.
$ curl -v https://link.g-language.org/RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI/extract=GOslim_process |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn
This will produce:
4 GO:0006950 response to stress
4 GO:0006259 DNA metabolic process
3 GO:0008150 biological_process
2 GO:0009058 biosynthetic process
1 GO:0051276 chromosome organization
1 GO:0048870 cell motility
1 GO:0034641 cellular nitrogen compound metabolic process
Now these values are readily used to test its enrichment by Fisher's exact test, for example, to calculate Gene Ontology enrichment scores.
If alternative classification is desirable, simply change the extracting term from GOslim_process to, for example, KEGG BRITE hierarchy.
$ curl -v https://link.g-language.org/NC_000913/extract=KEGG_Brite |grep \# |cut -f 2,3 |grep ko |sort |uniq -c |sort -rn
This will produce:
1452 ko00001 KEGG Orthology (KO)
1017 ko01000 Enzymes
838 ko00002 KEGG pathway modules
358 ko01000 Enzymes
282 ko02000 Transporters
197 ko02000 Transporters
129 ko03000 Transcription factors
89 ko03400 DNA repair and recombination proteins
84 ko03016 Transfer RNA biogenesis
65 ko01002 Peptidases
61 ko02035 Bacterial motility proteins
58 ko02022 Two-component system
57 ko03011 Ribosome
56 ko03011 M00178 Ribosome, bacteria
52 ko02044 Secretion system
49 ko03009 Ribosome biogenesis
45 ko01007 Amino acid related enzymes
44 ko00002 KEGG pathway modules
39 ko01005 Lipopolysaccharide biosynthesis proteins
33 ko01001 Protein kinases
31 ko03011 M00179 Ribosome, archaea
28 ko03036 Chromosome
27 ko03110 Chaperones and folding catalysts
27 ko01003 Glycosyltransferases
26 ko03036 Chromosome
26 ko03032 DNA replication proteins
25 ko02044 Secretion system
20 ko03110 Chaperones and folding catalysts
20 ko01004 Lipid biosynthesis proteins
19 ko03009 Ribosome biogenesis
15 ko03012 Translation factors
13 ko02044 M00331 Type II general secretion system
12 ko02044 M00335 Sec (secretion) system
11 ko02000 M00240 Iron complex transport system
11 ko01002 Peptidases
10 ko03021 Transcription machinery
10 ko03021 Transcription machinery
10 ko02035 Bacterial motility proteins
10 ko02000 M00324 Dipeptide transport system
9 ko03400 M00260 DNA polymerase III complex, bacteria
9 ko03032 M00260 DNA polymerase III complex, bacteria
9 ko02000 M00306 PTS system, fructose-specific II-like component
8 ko03400 DNA repair and recombination proteins
8 ko03032 DNA replication proteins
8 ko03000 Transcription factors
8 ko00194 Photosynthesis proteins
7 ko02000 M00221 Putative simple sugar transport system
7 ko01006 Prenyltransferases
6 ko02000 M00439 Oligopeptide transport system
6 ko02000 M00239 Peptides/nickel transport system
6 ko02000 M00237 Branched-chain amino acid transport system
6 ko01005 Lipopolysaccharide biosynthesis proteins
5 ko03016 Transfer RNA biogenesis
5 ko03012 Translation factors
5 ko02000 M00440 Nickel transport system
5 ko02000 M00279 PTS system, galactitol-specific II component
5 ko02000 M00229 Arginine transport system
5 ko02000 M00185 Sulfate transport system
4 ko03400 M00183 RNA polymerase, bacteria
4 ko03021 M00183 RNA polymerase, bacteria
4 ko02044 M00336 Twin-arginine translocation (Tat) system
4 ko02022 Two-component system
4 ko02000 M00349 Microcin C transport system
4 ko02000 M00348 Glutathione transport system
4 ko02000 M00300 Putrescine transport system
4 ko02000 M00299 Spermidine/putrescine transport system
4 ko02000 M00283 PTS system, ascorbate-specific II component
4 ko02000 M00238 D-Methionine transport system
4 ko02000 M00230 Glutamate/aspartate transport system
4 ko02000 M00226 Histidine transport system
4 ko02000 M00225 Lysine/arginine/ornithine transport system
4 ko02000 M00222 Phosphate transport system
4 ko02000 M00219 AI-2 transport system
4 ko02000 M00209 Osmoprotectant transport system
4 ko02000 M00198 Putative sn-glycerol-phosphate transport system
4 ko02000 M00197 Putative fructooligosaccharide transport system
4 ko02000 M00194 Maltose/maltodextrin transport system
4 ko02000 M00193 Putative spermidine/putrescine transport system
4 ko02000 M00189 Molybdate transport system
3 ko04812 Cytoskeleton proteins
3 ko02035 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system
3 ko02030 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system
3 ko02022 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system
3 ko02022 M00474 RcsC-RcsD-RcsB (capsule synthesis) two-component regulatory system
3 ko02001 Solute carrier family
3 ko02000 M00436 Sulfonate transport system
3 ko02000 M00435 Taurine transport system
3 ko02000 M00320 Lipopolysaccharide export system
3 ko02000 M00287 PTS system, galactosamine-specific II component
3 ko02000 M00280 PTS system, glucitol/sorbitol-specific II component
3 ko02000 M00276 PTS system, mannose-specific II component
3 ko02000 M00275 PTS system, cellobiose-specific II component
3 ko02000 M00274 PTS system, mannitol-specific II component
3 ko02000 M00259 Heme transport system
3 ko02000 M00255 Lipoprotein-releasing system
3 ko02000 M00254 ABC-2 type transport system
3 ko02000 M00248 Putative antibiotic transport system
3 ko02000 M00242 Zinc transport system
3 ko02000 M00241 Vitamin B12 transport system
3 ko02000 M00234 Cystine transport system
3 ko02000 M00232 General L-amino acid transport system
3 ko02000 M00227 Glutamine transport system
3 ko02000 M00217 D-Allose transport system
3 ko02000 M00215 D-Xylose transport system
3 ko02000 M00214 Methyl-galactoside transport system
3 ko02000 M00213 L-Arabinose transport system
3 ko02000 M00212 Ribose transport system
3 ko02000 M00210 Putative ABC transport system
3 ko02000 M00208 Glycine betaine/proline transport system
3 ko02000 M00207 Putative multiple sugar transport system
3 ko02000 M00192 Putative thiamine transport system
3 ko02000 M00191 Thiamine transport system
3 ko01008 Polyketide biosynthesis proteins
2 ko04040 Ion channels
2 ko02044 M00429 Competence-related DNA transformation transporter
2 ko02042 Bacterial toxins
2 ko02022 M00502 GlrK-GlrR (amino sugar metabolism) two-component regulatory system
2 ko02022 M00500 AtoS-AtoC (complexed poly-(R)-3-hydroxybutyrate biosynthesis) two-component regulatory system
2 ko02022 M00499 HydH-HydG (metal tolerance) two-component regulatory system
2 ko02022 M00497 GlnL-GlnG (nitrogen regulation) two-component regulatory system
2 ko02022 M00488 DcuS-DcuR (aerobic C4-dicarboxylate metabolism) two-component regulatory system
2 ko02022 M00486 CitA-CitB (citrate fermentation) two-component regulatory system
2 ko02022 M00477 EvgS-EvgA (acid and drug tolerance) two-component regulatory system
2 ko02022 M00475 BarA-UvrY (central carbon metabolism) two-component regulatory system
2 ko02022 M00473 UhpB-UhpA (hexose phosphates uptake) two-component regulatory system
2 ko02022 M00472 NarQ-NarP (nitrate respiration) two-component regulatory system
2 ko02022 M00471 NarX-NarL (nitrate respiration) two-component regulatory system
2 ko02022 M00456 ArcB-ArcA (anoxic redox control) two-component regulatory system
2 ko02022 M00455 TorS-TorR (trimethylamine N-oxide respiration) two-component regulatory system
2 ko02022 M00454 KdpD-KdpE (potassium transport) two-component regulatory system
2 ko02022 M00453 QseC-QseB (quorum sensing) two-component regulatory system
2 ko02022 M00452 CusS-CusR (copper tolerance) two-component regulatory system
2 ko02022 M00451 BasS-BasR (antimicrobial peptide resistance) two-component regulatory system
2 ko02022 M00450 BaeS-BaeR (envelope stress response) two-component regulatory system
2 ko02022 M00449 CreC-CreB (phosphate regulation) two-component regulatory system
2 ko02022 M00447 CpxA-CpxR (envelope stress response) two-component regulatory system
2 ko02022 M00446 RstB-RstA two-component regulatory system
2 ko02022 M00445 EnvZ-OmpR (osmotic stress response) two-component regulatory system
2 ko02022 M00444 PhoQ-PhoP (magnesium transport) two-component regulatory system
2 ko02022 M00434 PhoR-PhoB (phosphate starvation response) two-component regulatory system
2 ko02000 M00303 PTS system, N-acetylmuramic acid-specific II component
2 ko02000 M00272 PTS system, arbutin-, cellobiose-, and salicin-specific II component
2 ko02000 M00270 PTS system, trehalose-specific II component
2 ko02000 M00266 PTS system, maltose and glucose-specific II component
2 ko02000 M00265 PTS system, glucose-specific II component
2 ko02000 M00258 Putative ABC transport system
2 ko02000 M00256 Cell division transport system
2 ko02000 M00224 Putative phosphonate transport system
2 ko02000 M00223 Phosphonate transport system
2 ko02000 M00211 Putative ABC transport system
1 ko04121 Ubiquitin system
1 ko04090 Cellular antigens
1 ko03051 Proteasome
1 ko02044 M00571 AlgE-type Mannuronan C-5-Epimerase transport system
1 ko02044 M00339 RaxAB-RaxC type I secretion system
1 ko02044 M00326 RTX toxin transport system
1 ko02000 M00491 Putative arabinogalactan oligomer transport system
1 ko02000 M00325 alpha-Hemolysin/cyclolysin transport system
1 ko02000 M00305 PTS system, 2-O-A-mannosyl-D-glycerate-specific II component
1 ko02000 M00277 PTS system, N-acetylgalactosamine-specific II component
1 ko02000 M00273 PTS system, fructose-specific II component
1 ko02000 M00271 PTS system, beta-glucosides-specific II component
1 ko02000 M00268 PTS system, arbutin-like II component
1 ko02000 M00267 PTS system, N-acetylglucosamine-specific II component
1 ko02000 M00190 Iron(III) transport system
1 ko00194 Photosynthesis proteins
G-Links database is updated once every six month. Next update is Feb 2016.
- UniProt : 2015_10
- idmapping, Swiss-Prot, TrEMBL, taxnomic_divisions
- GEO : 2015_10
- GeoDb_blob82
- Enzyme : 16-Sep-2015
- PharmGKB : 2015-10-04
- Genes, RSID mapping
- PID : 2012-9-18 (latest)
- BIOGRID : 3.4.129
- Gene Ontology : 2015-09-24
- Data and images from UniProt: Creative Commons Attribution-NoDerivs
- Data and images from Coxpresdb: Creative Commons Attribution 2.1
- Data and images from PDB: Free of all copyright restrictions
- Data and images from STRING API: N.A. for individual items
- Data and images from KEGG: Commercial users should obtain a license
-
G-language Maps
- Institute for Advanced Biosciences
- E-Cell Simulation Environment
- E.coli multi-omics database
- Database of bacterial replication terminus
Kazuharu Arakawa, Ph.D.
G-language Project Leader Associate Professor
Institute for Advanced Biosciences Keio University
997-0017 Japan Tel/Fax: +81-235-29-0800 gaou@sfc.keio.ac.jp