You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried compressing a large BCF file I use as a reference and it seems something goes wrong because decompressed file is different (much larger) than input file.
My BCF file has 1,000,000 diploid phased samples and 2,271,035 variant entries.
This is a roughly 10GB BCF file.
I compressed it with gtshark and it resulted in a 5.8MB _db and 26M _gt.
This seemed a bit suspicious because the file size seems very small.
The compression finished in about 12 hours with no error message or error code
meta size: 60
header size: 516
samples size: 3068156
chrom size: 796
pos size: 1201908
id size: 796
ref size: 796
alt size: 796
qual size: 796
filter size: 1788
info size: 1736548
Processing time: 43270.2 seconds.
I launched the decompression, it has been running for more than a day, the output BCF is more than 26GB in size.
This seems off because the input file was about 10 GB. I checked that the output file was BCF (internally gzip compressed).
Output of software was :
Opening file of size: 6013013
Opening file of size: 27086378
2271035
Processing time: 129542 seconds.
No error messages or anything.
Did you run gtshark on large BCf files (millions of samples * millions of variants) ?
I am sorry I cannot share the BCF file because of size, I'll run some tests on the output file and keep you up to date on what I find.
Regards.
Rick
The text was updated successfully, but these errors were encountered:
Hello.
I tried compressing a large BCF file I use as a reference and it seems something goes wrong because decompressed file is different (much larger) than input file.
I use the following command to compress
and following command to decompress
My BCF file has 1,000,000 diploid phased samples and 2,271,035 variant entries.
This is a roughly 10GB BCF file.
I compressed it with gtshark and it resulted in a 5.8MB
_db
and 26M_gt
.This seemed a bit suspicious because the file size seems very small.
The compression finished in about 12 hours with no error message or error code
I launched the decompression, it has been running for more than a day, the output BCF is more than 26GB in size.
This seems off because the input file was about 10 GB. I checked that the output file was BCF (internally gzip compressed).
Output of software was :
No error messages or anything.
Did you run gtshark on large BCf files (millions of samples * millions of variants) ?
I am sorry I cannot share the BCF file because of size, I'll run some tests on the output file and keep you up to date on what I find.
Regards.
Rick
The text was updated successfully, but these errors were encountered: