You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thank you tons for what you have already done. My personal interest is in parsing XML exports from gene descriptions, like https://www.ncbi.nlm.nih.gov/gene?Db=gene&Cmd=DetailsSearch&Term=7161 (Send To File -> XML format).
I would happily contribute what is missing to help with the parsing, is there an easy way to get me going? Or shall I just come up with something that I hope to be compatible with your plans?
Many thanks!
This is how it looks:
<?xml version="1.0"?>
<!DOCTYPE Entrezgene-Set PUBLIC "-//NLM//DTD NCBI-Entrezgene, 21st January 2005//EN" "https://www.ncbi.nlm.nih.gov/data_specs/dtd/NCBI_Entrezgene.dtd">
<Entrezgene-Set>
<Entrezgene>
<Entrezgene_track-info>
<Gene-track>
<Gene-track_geneid>7161</Gene-track_geneid>
<Gene-track_status value="live">0</Gene-track_status>
<Gene-track_create-date>
<Date>
<Date_std>
<Date-std>
<Date-std_year>1998</Date-std_year>
<Date-std_month>8</Date-std_month>
<Date-std_day>13</Date-std_day>
</Date-std>
</Date_std>
</Date>
</Gene-track_create-date>
<Gene-track_update-date>
<Date>
<Date_std>
<Date-std>
<Date-std_year>2024</Date-std_year>
<Date-std_month>12</Date-std_month>
<Date-std_day>10</Date-std_day>
<Date-std_hour>8</Date-std_hour>
<Date-std_minute>46</Date-std_minute>
<Date-std_second>0</Date-std_second>
</Date-std>
</Date_std>
</Date>
</Gene-track_update-date>
</Gene-track>
</Entrezgene_track-info>
<Entrezgene_type value="protein-coding">6</Entrezgene_type>
<Entrezgene_source>
<BioSource>
<BioSource_genome value="genomic">1</BioSource_genome>
<BioSource_origin value="natural">1</BioSource_origin>
<BioSource_org>
<Org-ref>
<Org-ref_taxname>Homo sapiens</Org-ref_taxname>
<Org-ref_common>human</Org-ref_common>
<Org-ref_db>
<Dbtag>
<Dbtag_db>taxon</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>9606</Object-id_id>
</Object-id>
</Dbtag_tag>
</Dbtag>
</Org-ref_db>
<Org-ref_orgname>
<OrgName>
<OrgName_name>
<OrgName_name_binomial>
<BinomialOrgName>
<BinomialOrgName_genus>Homo</BinomialOrgName_genus>
<BinomialOrgName_species>sapiens</BinomialOrgName_species>
</BinomialOrgName>
</OrgName_name_binomial>
</OrgName_name>
<OrgName_attrib>specified</OrgName_attrib>
<OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo</OrgName_lineage>
<OrgName_gcode>1</OrgName_gcode>
<OrgName_mgcode>2</OrgName_mgcode>
<OrgName_div>PRI</OrgName_div>
</OrgName>
</Org-ref_orgname>
</Org-ref>
</BioSource_org>
<BioSource_subtype>
<SubSource>
<SubSource_subtype value="chromosome">1</SubSource_subtype>
<SubSource_name>1</SubSource_name>
</SubSource>
</BioSource_subtype>
</BioSource>
</Entrezgene_source>
<Entrezgene_gene>
<Gene-ref>
<Gene-ref_locus>TP73</Gene-ref_locus>
<Gene-ref_desc>tumor protein p73</Gene-ref_desc>
<Gene-ref_maploc>1p36.32</Gene-ref_maploc>
<Gene-ref_db>
<Dbtag>
<Dbtag_db>HGNC</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_str>HGNC:12003</Object-id_str>
</Object-id>
</Dbtag_tag>
</Dbtag>
<Dbtag>
<Dbtag_db>Ensembl</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_str>ENSG00000078900</Object-id_str>
</Object-id>
</Dbtag_tag>
</Dbtag>
<Dbtag>
<Dbtag_db>MIM</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>601990</Object-id_id>
</Object-id>
</Dbtag_tag>
</Dbtag>
<Dbtag>
<Dbtag_db>AllianceGenome</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_str>HGNC:12003</Object-id_str>
</Object-id>
</Dbtag_tag>
</Dbtag>
</Gene-ref_db>
<Gene-ref_syn>
<Gene-ref_syn_E>P73</Gene-ref_syn_E>
<Gene-ref_syn_E>CILD47</Gene-ref_syn_E>
</Gene-ref_syn>
<Gene-ref_formal-name>
<Gene-nomenclature>
<Gene-nomenclature_status value="official"/>
<Gene-nomenclature_symbol>TP73</Gene-nomenclature_symbol>
<Gene-nomenclature_name>tumor protein p73</Gene-nomenclature_name>
<Gene-nomenclature_source>
<Dbtag>
<Dbtag_db>HGNC</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_str>HGNC:12003</Object-id_str>
</Object-id>
</Dbtag_tag>
</Dbtag>
</Gene-nomenclature_source>
</Gene-nomenclature>
</Gene-ref_formal-name>
</Gene-ref>
</Entrezgene_gene>
<Entrezgene_prot>
<Prot-ref>
<Prot-ref_name>
<Prot-ref_name_E>p53-like transcription factor</Prot-ref_name_E>
<Prot-ref_name_E>p53-related protein</Prot-ref_name_E>
</Prot-ref_name>
<Prot-ref_desc>tumor protein p73</Prot-ref_desc>
</Prot-ref>
</Entrezgene_prot>
<Entrezgene_summary>This gene encodes a member of the p53 family of transcription factors involved in cellular responses to stress and development. It maps to a region on chromosome 1p36 that is frequently deleted in neuroblastoma and other tumors, and thought to contain multiple tumor suppressor genes. The demonstration that this gene is monoallelically expressed (likely from the maternal allele), supports the notion that it is a candidate gene for neuroblastoma. Many transcript variants resulting from alternative splicing and/or use of alternate promoters have been found for this gene, but the biological validity and the full-length nature of some variants have not been determined. [provided by RefSeq, Feb 2011]</Entrezgene_summary>
<Entrezgene_location>
<Maps>
<Maps_display-str>1p36.32</Maps_display-str>
<Maps_method>
<Maps_method_map-type value="cyto"/>
</Maps_method>
</Maps>
</Entrezgene_location>
<Entrezgene_gene-source>
<Gene-source>
<Gene-source_src>LocusLink</Gene-source_src>
<Gene-source_src-int>7161</Gene-source_src-int>
<Gene-source_src-str2>7161</Gene-source_src-str2>
</Gene-source>
</Entrezgene_gene-source>
<Entrezgene_locus>
<Gene-commentary>
<Gene-commentary_type value="genomic">1</Gene-commentary_type>
<Gene-commentary_heading>Reference GRCh38.p14 Primary Assembly</Gene-commentary_heading>
<Gene-commentary_label>Chromosome 1 Reference GRCh38.p14 Primary Assembly</Gene-commentary_label>
<Gene-commentary_accession>NC_000001</Gene-commentary_accession>
<Gene-commentary_version>11</Gene-commentary_version>
<Gene-commentary_seqs>
<Seq-loc>
<Seq-loc_int>
<Seq-interval>
<Seq-interval_from>3652515</Seq-interval_from>
<Seq-interval_to>3736200</Seq-interval_to>
<Seq-interval_strand>
<Na-strand value="plus"/>
</Seq-interval_strand>
<Seq-interval_id>
<Seq-id>
<Seq-id_gi>568815597</Seq-id_gi>
</Seq-id>
</Seq-interval_id>
</Seq-interval>
</Seq-loc_int>
</Seq-loc>
</Gene-commentary_seqs>
<Gene-commentary_products>
<Gene-commentary>
<Gene-commentary_type value="mRNA">3</Gene-commentary_type>
<Gene-commentary_heading>Reference</Gene-commentary_heading>
<Gene-commentary_label>transcript variant 1</Gene-commentary_label>
<Gene-commentary_accession>NM_005427</Gene-commentary_accession>
<Gene-commentary_version>4</Gene-commentary_version>
<Gene-commentary_genomic-coords>
<Seq-loc>
<Seq-loc_mix>
<Seq-loc-mix>
<Seq-loc>
<Seq-loc_int>
<Seq-interval>
<Seq-interval_from>3652515</Seq-interval_from>
<Seq-interval_to>3652640</Seq-interval_to>
<Seq-interval_strand>
<Na-strand value="plus"/>
</Seq-interval_strand>
<Seq-interval_id>
<Seq-id>
<Seq-id_gi>568815597</Seq-id_gi>
</Seq-id>
</Seq-interval_id>
</Seq-interval>
</Seq-loc_int>
</Seq-loc>
...
The text was updated successfully, but these errors were encountered:
@smoe I will welcome any changes you see fit. I tried to adapt the original ASN files as close as possible. Many of the Entrezgene tags are not included in the Bioseq, but the general.rs and the sequence tags will apply.
Hello,
Thank you tons for what you have already done. My personal interest is in parsing XML exports from gene descriptions, like https://www.ncbi.nlm.nih.gov/gene?Db=gene&Cmd=DetailsSearch&Term=7161 (Send To File -> XML format).
I would happily contribute what is missing to help with the parsing, is there an easy way to get me going? Or shall I just come up with something that I hope to be compatible with your plans?
Many thanks!
This is how it looks:
The text was updated successfully, but these errors were encountered: