Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during fastq parsing #30

Closed
jsabban opened this issue Aug 21, 2024 · 4 comments
Closed

Error during fastq parsing #30

jsabban opened this issue Aug 21, 2024 · 4 comments
Labels

Comments

@jsabban
Copy link

jsabban commented Aug 21, 2024

Hi !

I have an error when I use a FASTQ file as input, I do not understand why... I use the docker image from singularity.
I ran that :

singularity run \
src.sif toulligqc \
-a sequencing_summary.txt \
--output-directory output \
-p pass_barcode01.pod5 \
-q pass_barcode01.fastq \
-l 'barcode01,barcode02,barcode03'

And the output is :

ToulligQC version 2.7
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Pod5 extractor
* End of Pod5 extractor (done in 0m0.07s)
* Start fastq extractor
Processed: 4000read [00:02, 1843.16read/s]
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 939, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 986, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 4 columns passed, passed data had 3 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/toulligqc", line 33, in <module>
    sys.exit(load_entry_point('toulligqc==2.7', 'console_scripts', 'toulligqc')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/toulligqc-2.7-py3.12.egg/toulligqc/toulligqc.py", line 422, in main
    extractor.init()
  File "/usr/local/lib/python3.12/dist-packages/toulligqc-2.7-py3.12.egg/toulligqc/fastq_extractor.py", line 60, in init
    self.dataframe_1d = self._load_fastq_data()
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/toulligqc-2.7-py3.12.egg/toulligqc/fastq_extractor.py", line 264, in _load_fastq_data
    fq_df = pd.DataFrame(fq_df, columns=columns)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 806, in __init__
    arrays, columns, index = nested_data_to_arrays(
                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 520, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 845, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pandas/core/internals/construction.py", line 942, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 3 columns

Someone can help me, please ?

@alihamraoui
Copy link
Member

Hi @jsabban,

It seems there might be an exception issue with your FASTQ file. To help find the problem, could you please provide the first 4 lines of your FASTQ file?

Ali

@jsabban
Copy link
Author

jsabban commented Aug 21, 2024

Hi @alihamraoui , here are the first 4 lines of my FASTQ file :

@1a790b27-c54a-44b2-b6bb-0066afaf968e runid=6b2d611263610f8f4f521413ba294547e26a6308 read=42 ch=391 start_time=2024-04-16T16:37:38.785785+02:00 flow_cell_id=FAX94246 protocol_group_id=EMPHY sample_id=EMPHY-MLTPXreq-T16-L20-NBD114-24 barcode=barcode01 barcode_alias=barcode01 parent_read_id=1a790b27-c54a-44b2-b6bb-0066afaf968e basecall_model_version_id=dna_r10.4.1_e8.2_400bps_sup@v4.2.0
CCAATTACGTCGTTGTAGTCCAGCAAATACGTTTGTCACACAAACTTCATATTCTGGGCAACTCGGAGAGCGACTTAATGAAACATTTAAAAGATATAAACTACAATGGAAAATACACTTGCACTCATTGAAAACCCTACTCAAGGGCTTAAAACATTATCCGGAACAATTAGTTGGGAAGCTTTTAAACAAAACTGCTTTAAAGGATTTACAACAACGTACTGCACTCTTTACCAGTTGGTATAAAGTCGAATGTTCTGCCAACTTAAACCAGACCAACGTAGTTTAACGAATCGTTACGGGTATAATCCAAACTATCAGCAAACTGGGTCAAGTTCCATGAACCGATTTAAAGATCGTATTAAAGCCCGATTACGCGAATCTACTCACCAAGCACACCCGCTTTATGCAATGTCTGGTTCAATCCGTGGAACCTTTGGAAGCCGTGGGTATGAAACACGACTTCAACGTCCGCTGACTGGTCCTGTAACACAAATGAGTCAGGAATTTTTAAATCTAGTTTAAGTAAATTTCTAAGGGTATCCCATTTTATTGGTACCCCTAGAAATTTTTATATTTAAACATCATATTGTTTTGCGATATACTGGTAGAAAGCTATAATATAAACGAGCATCGTGGATAATAATGGAACCGTTATTAGACGAGAGCTTTGAAAGAAATGTTGAGGATTTCATATGTCAGGATCTACAGGAGAACGTCCTTTTAGTGACATCGTTACTAGTATTCGTTACTGGGTAATTCACAGTGTTACAATTCCGTCACTCTTTATTGCAGGATGGCTTTTTGTAAGTACTGGTTTAGCTTACGATGTATTTGGAACACCGCGTCCAAACGAATACTTTACGGATA
+
EGSSIHGJEDFGSKIIISSRIHILKLKSJOCEFKJMFHLGHEGFEFGEFS<<;;;;FHQHGJG66200588----/--/(((((7520))78:988899:CDGIENGILSKRKLSHLFBC@CLJFKSKMISNSSDABABJMISSSKSSMIGMJIIHJHIJGA56EJE>=77+***+11*54000/*.34@BJSOIINGSLMHJKKKOGMSLFCCCCGSIGSSISGOHFKINLFGGMMLIHSKSS11111=44445C64444DDNNSIISNIHLKJSSLKMMISHFJSLJMOIOKHCJHEG.--.HMBBA/-)))'''''13:;HSIJKLSNJKRKIKPLEEISSHMKISOSFIKSOKMSSKKHISSKSHOLJE>,,,,,FKOEGIHFHFNLMSOSSLNIJKELQMSISOKOSJSHMSGGEL==>=>FIIKSSSJRQMLIHSKJFSSKJMOSLJKKSOS=<<<<CBFSNGOSGSHJGLILSIGSRSSKKOSSKKMSNJJNSSKEJKHLIOLSHKKSP000000FGLSEKHJJJKRJSIRSKKPSHKSINLQSSSMOJJG<G?)<)))DC@EGIILNPFKSSSSKGSGSILSSENSSKSSGFHGNJSHNGGI:::::>QSFIHKSKSMLKKSHJKIJKSHSLSLHSSHGSJJILGSHKJSJJNSIFABBFFECD<??>1:;>@EGEPOJLKMLSNSFSIJS55655644333AAILKPKKISJHPSSLSSMSNHSMPSMGDGDCDKGF22222<<<<@JMSMHJSSIJMSKSNSJJSSSMHSMSOKRSMKD@@@@@SKPGSNRSSSKSIKGGJSSSSDDCDFJK@?@?@CFGFSLGM<77DDDGHJFENSHGEEJSEBDG@CB@>70.,)'&

@alihamraoui
Copy link
Member

Hi @jsabban,

Thank you for reporting this issue.

I noticed that in the newer version of FASTQ, the sample_id flag is used instead of sampleid , which seems to be part of the problem.

I’ve addressed this issue in commit ab76466.

A new version (2.7.1) will be available tomorrow!

Thank you,
Ali

@jsabban
Copy link
Author

jsabban commented Aug 21, 2024

Whoa, so quick ! Thank you for the fix 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants