-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabix failure with a tab-separated file #1165
Comments
This is the same problem as #1085 — your file matches the pattern for a FASTQ index file. If this were to be fixed, the practical way to do it would probably be to recognise these obscure format types only if the filename has |
jmarshall
added a commit
to jmarshall/htslib
that referenced
this issue
Oct 31, 2021
Format detection to date uses only the stream contents, as filenames are not always available (e.g., when reading from standard input) or may be inaccurate or unexpected. However there are a very few cases where the filename extension is important: * FASTA/Q indexes (uncommon for hts_open()) are a particular case of 5/6-column BED files (comparatively common). We don't want to misrecognise any actual BED files as FASTA/Q indexes, so require a .fai/.fqi extension for the latter -- which are unlikely to appear on standard input anyway, so filenames will usually be available. * GZI indexes have not previously been recognised, as they have no magic numbers. They can now be recognised by their .gzi extension. Fixes samtools#1085, fixes samtools#1165, and fixes samtools#1347.
whitwham
pushed a commit
that referenced
this issue
Nov 3, 2021
Format detection to date uses only the stream contents, as filenames are not always available (e.g., when reading from standard input) or may be inaccurate or unexpected. However there are a very few cases where the filename extension is important: * FASTA/Q indexes (uncommon for hts_open()) are a particular case of 5/6-column BED files (comparatively common). We don't want to misrecognise any actual BED files as FASTA/Q indexes, so require a .fai/.fqi extension for the latter -- which are unlikely to appear on standard input anyway, so filenames will usually be available. * GZI indexes have not previously been recognised, as they have no magic numbers. They can now be recognised by their .gzi extension. Fixes #1085, fixes #1165, and fixes #1347.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I tried to index a tab-separated file, but did not work. I cannot find out what is wrong based on the manual.
(tabix & bgzip version: 1.10.2)
What I did:
$ cat segments_edit.txt
1 12807 1363539 2 1 1
1 1375071 2390715 2 1 1
1 2390840 2391074 13 13 0
1 2391081 2606687 3 3 0
1 2606722 2607162 10 8 2
1 2608327 2769359 3 3 0
1 2769933 4692525 2 1 1
1 4692613 4693465 5 4 1
1 4693471 5727630 2 1 1
1 5727636 5729995 9 9 0
$ bgzip -c segments_edit.txt > segments_edit.txt.gz
$ tabix -s 1 -b 2 -e 3 segments_edit.txt.gz
[E::hts_hopen] Failed to open file segments_edit.txt.gz
[E::hts_open_format] Failed to open file "segments_edit.txt.gz" : Exec format error
Couldn't understand format of "segments_edit.txt.gz"
Curiously, a similar tab-separated file with more columns did not make the error.
$ cat segments_edit2.txt
1 12807 1363539 0.46048791990606 606 0.130356089752489 1.01605651970555 34814 0.270825944513021 2 1 1 -6.79066486181859
1 1375071 2390715 0.498852332976964 1100 0.10633779244189 1.02051517964992 37882 0.268514575387998 2 1 1 -6.73678537439279
1 2390840 2391074 0.149660750644182 9 0.136341483048997 1.74919776595488 18 0.29637515971167 13 13 0 -6.73342046642509
1 2391081 2606687 0.312823414314777 352 0.105477395509564 1.02408066029078 8861 0.248499967924125 3 3 0 -6.76450678203741
1 2606722 2607162 0.310212098970814 23 0.130361901245888 0.814279114141102 61 0.296649338537346 10 8 2 -6.7356238928564
1 2608327 2769359 0.313026985360921 625 0.106897888180431 1.02576493748744 7895 0.223811722067425 3 3 0 -6.76170437072213
1 2769933 4692525 0.494485945916869 2558 0.0913973159663615 1.02104811250325 67219 0.265606106221456 2 1 1 -6.74269965061846
1 4692613 4693465 0.353173170862764 45 0.126153578040915 0.815532434283755 89 0.223744293072089 5 4 1 -6.73753695859768
1 4693471 5727630 0.498894084261619 1228 0.0763217597526189 1.01600832273797 39203 0.255378732663932 2 1 1 -6.73573321327665
1 5727636 5729995 0.191394836620607 94 0.108008545041011 0.855271544591458 220 0.277547127775007 9 9 0 -6.74512739178415
$ bgzip -c segments_edit2.txt > segments_edit2.txt.gz
$ tabix -s 1 -b 2 -e 3 segments_edit2.txt.gz
What am I doing wrong?
Thank you in advance.
The text was updated successfully, but these errors were encountered: