Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix on-the-fly indexing of VCF w.r.t virtual offsets. #1837

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

jkbonfield
Copy link
Contributor

@jkbonfield jkbonfield commented Sep 12, 2024

When using bcftools view --write-index -o out.vcf.gz the virtual file offsets can differ depending on whether we do a bgzf_tell before or after a flush. Specifically it could point to the last byte in the current BGZF block or the first byte in the next BGZF block. Ultimately both of these resolve to the same physical location, but in some situations the former may mean attempting to read zero bytes (the remainder of the bgzf block). This has been known in the past to be misinterpreted as an EOF. (See samtools/samtools#1861)

It also means the contents of the index produced by --write-index and a separate bcftools index command can yield different results, albeit both representing the same data.

The fix for the samtools / bcftools issue above (#1672) when multi-threading inadvertently recreated the bug when not multi-threading.

Fixes samtools/bcftools#2267

When using bcftools view --write-index -o out.vcf.gz the virtual file
offsets can differ depending on whether we do a bgzf_tell before or
after a flush.  Specifically it could point to the last byte in the
current BGZF block or the first byte in the next BGZF block.
Ultimately both of these resolve to the same physical location, but in
some situations the former may mean attempting to read zero bytes (the
remainder of the bgzf block).  This has been known in the past to be
misinterpreted as an EOF.  (See samtools/samtools#1861)

It also means the contents of the index produced by --write-index and
a separate bcftools index command can yield different results, albeit
both representing the same data.

The fix for the samtools / bcftools issue above (samtools#1672)
when multi-threading inadvertently recreated the bug when not
multi-threading.

Fixes samtools/bcftools#2267
@daviesrob daviesrob merged commit b66c6d2 into samtools:develop Sep 12, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BCFtools "--write-index=tbi" and bcftools index --tbi generate different index files
2 participants