Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

[BUG] Failing to parse SRA metadata #360

Open
dweemx opened this issue Aug 31, 2021 · 0 comments
Open

[BUG] Failing to parse SRA metadata #360

dweemx opened this issue Aug 31, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@dweemx
Copy link
Contributor

dweemx commented Aug 31, 2021

Describe the bug
Problem parsing the XML using pysradb for SRP187520 project. This is probably a bug in pysradb but for tracking purposes we keep this also as an issue here.

To Reproduce
Steps to reproduce the behavior:

  1. Configure with these options:
nextflow config vib-singlecell-nf/vsn-pipelines -r develop -entry sra
  1. Run using this entry point:
nextflow -C nextflow.config run vib-singlecell-nf/vsn-pipelines -entry sra
  1. See error:
Error executing process > 'sra:DOWNLOAD_FROM_SRA:SRA_TO_METADATA (1)'

Caused by:
  Process `sra:DOWNLOAD_FROM_SRA:SRA_TO_METADATA (1)` terminated with an error exit status (1)

Command executed:

  ~/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/utils/bin/sra_to_metadata.py             SRP187520                          --sample-filter "Drop-Seq.*"             --output "SRP187520_metadata.tsv"

Command exit status:
  1

Command output:
  Using NCBi's esearch and esummary interface to query...

Command error:
  Traceback (most recent call last):
    File "/opt/venv/lib/python3.6/site-packages/pysradb/sraweb.py", line 91, in xml_to_json
      json = xmltodict.parse(xml)["root"]
    File "/opt/venv/lib/python3.6/site-packages/xmltodict.py", line 327, in parse
      parser.Parse(xml_input, True)
  xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 1761
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
      
    File "~/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/utils/bin/sra_to_metadata.py", line 90, in <module>
      
      sample_attribute=True
      
    File "/opt/venv/lib/python3.6/site-packages/pysradb/sraweb.py", line 240, in sra_metadata
      
      exps_json[uid] = self.xml_to_json(exps_xml[uid])
      
    File "/opt/venv/lib/python3.6/site-packages/pysradb/sraweb.py", line 93, in xml_to_json
      
      raise RuntimeError("Unable to parse xml: {}".format(xml))
      
  RuntimeError: Unable to parse xml: <root><Summary><Title>GSM3639563: 10x_wing.disc_rep1; Drosophila melanogaster; RNA-Seq</Title><Platform instrument_model="NextSeq 550">ILLUMINA</Platform><Statistics total_runs="1" total_spots="157920657" total_bases="23688098550" total total_size="10433488343" load_done="true" cluster_name="public"/></Summary><Submitter acc="SRA856056" center_name="GEO" contact_name="Gene Expression Omnibus (GEO), NCBI, NLM, NIH, htt" lab_name=""/><Experiment acc="SRX5464686" ver="1" status="public" name="GSM363950x_wig63: 10x_wing.disc_rep1; Drosophila melanogaster; RNA-Seq"/><Study acc="SRP187520" name="Gene expression atlas of a developing tissue by single cell expression correlation analysis"/><Organism taxid="7227" ScientificName="Drosophila melanogaster"/><Sample acc="SRS4438""/><s562" name=""/><Instrument ILLUMINA="NextSeq 550"/><Library_descriptor><LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY><LIBRARY_SOURCE>TRANSCRIPTOMIC</LIBRARY_SOURCE><LIBRARY_SELECTION>cDNA</LIBRARY_SELECTION><LIBRARY_LAYOUT>                 <PAIRED/>               </LIBRRARY_SARY_LAYOUT><LIBRARY_CONSTRUCTION_PROTOCOL>Larvae were dissected in Schneider's medium in batches of 5 animals (to prevent hypoxia) and transferred into a tube containing Schneider's medium on ice for a maximum time of 30 minutes. To isolate single cells TrypLE (10x) ssue bwas added and the tissue incubated for 15 minutes in a water bath at 37°C , with gentle mixing every 5 minutes. Schneider's medium was then added to the loosened tissue pellets, followed by gentle mechanical dissociation using a P1000 pipette. The cell suspension was μM ct then passed through a 10 μM cell strainer to remove undigested tissue and cell clumps. 10x libraries were prepared according to 10x Genomica instructions accompanying the Single Cell 3' Library & Gel Bead Kit v2 (CG00052_SingleCell3_ReagentKitv2UserGuide_RevD).</LIBibrarrRARY_CONSTRUCTION_PROTOCOL></Library_descriptor><Bioproject>PRJNA525603</Bioproject><Biosample>SAMN11054382</Biosample></root>

Work dir:
  ~/work/7f/35f7c4c67b309cfc708133f5fa14bd

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Please complete the following information:

  • OS: [e.g. CentOS]
  • Nextflow Version: [e.g. 20.10.0]
  • vsn-pipelines Version: [e.g. 0.26.1]

Additional context
Add any other context about the problem here.

@dweemx dweemx added the bug Something isn't working label Aug 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant