Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed data error #20

Closed
koriege opened this issue Nov 7, 2024 · 4 comments
Closed

Compressed data error #20

koriege opened this issue Nov 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@koriege
Copy link

koriege commented Nov 7, 2024

Hi,
I am using your tool on a regular basis and stumbled over an error that occurs sometimes when I am processing genomic data files with a certain tool and redirect the uncompressed data stream into a compressor (gzip, pigz, bgzip) while creating gztool index with line numbers asynchronous in parallel. I tested gztool versions 1.4.3 and 1.6.1.

[tool] -in [file] -out /dev/stdout | bgzip -kc -@ [threads] | tee -i >(gztool -v 0 -f -i -x -C -I <file>.gzi) > [file].gz | cat
gztool -L [offset] -R 1 [file].gz

Decompressing [file].gz does not return any error and also other post-processing tools never complained about the input. The index gztool -l [file].gz looks fine to me too, but certain line offsets used to retrieve data can lead to ERROR: Compressed data error in '[file].gz' when using the following command.

gztool -L [offset] -R 1 [file].gz

Unfortunately I am unable to reproduce this issue on toy examples, but I always end up in the same issue when re-running the command above using the same input.

It would be great if you could have a look into it. Please find attached one of my seemingly erroneous files. Offset issues start to happen at line 38234850 (gztool -L 38234850 -R 1 erroneous_data.gz).
erroneous_data.gz

Thanks you so much!

@circulosmeos
Copy link
Owner

circulosmeos commented Nov 11, 2024

Hi @koriege
Thank you very much for your detailed review and the data file, I have to carefully study this error.

Meanwhile you can add -p to gztool command line with -L, and it will patch the supposed "error" itself 👍
Maybe also -v0 so it doesn't bother you with details about the patching.
From your example: gztool -v0 -p -L [offset] -R 1 [file].gz

It is very rewarding to know about these uses of gztool 😊

@koriege
Copy link
Author

koriege commented Nov 11, 2024

Oh, I overlooked the -p parameter. This solves it for now. You may close this issue, which indeed seems to be related to the gzip blocks introduced by bgzip (I am using v1.16 installed via conda).

@circulosmeos
Copy link
Owner

But gztool should manage bgzip blocks correctly!
I keep this open as this shouldn't happen anyway.

@circulosmeos
Copy link
Owner

I've detected where the error resided and just released v1.7.0, to patch it.
Thanks for your example file ! 👍

@circulosmeos circulosmeos self-assigned this Nov 27, 2024
@circulosmeos circulosmeos added the bug Something isn't working label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants