Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a couple small VCF auto-indexing bugs. #1581

Merged
merged 1 commit into from
Mar 13, 2023

Conversation

jkbonfield
Copy link
Contributor

  1. sam_idx_save wasn't validating the file is BGZF. It's invalid usage to try calling this function on uncompressed data, but we should double check.

    Note this is triggered by a bcftools bug where -o foo.vcf.gz##idx##foo.vcf.gz.csi writes VCF rather than VCF.gz as the "filename" doesn't end in .gz.

  2. Add the hts_idx_amend_last calls to vcf_write as we did previously for SAM/BAM.

    This isn't technically a requirement, as all it's doing is changing virtual offsets to an alternate form that gives the same file offset (see comments above hts_idx_amend_last), but doing so means the auto-build indices match those produced by a standalone index command.

    This fix isn't complete as it hasn't been worked on for BCF yet. However it comes under the "nicety" category and isn't really fixing a bug so we can try to figure out how to tidy up BCF later (plus VCF.gz is basically the universal format).

1. sam_idx_save wasn't validating the file is BGZF.  It's invalid
   usage to try calling this function on uncompressed data, but we
   should double check.

   Note this is triggered by a bcftools bug where -o
   foo.vcf.gz##idx##foo.vcf.gz.csi writes VCF rather than VCF.gz as
   the "filename" doesn't end in .gz.

2. Add the hts_idx_amend_last calls to vcf_write as we did previously
   for SAM/BAM.

   This isn't technically a requirement, as all it's doing is changing
   virtual offsets to an alternate form that gives the same file
   offset (see comments above hts_idx_amend_last), but doing so means
   the auto-build indices match those produced by a standalone index
   command.

   This fix isn't complete as it hasn't been worked on for BCF yet.
   However it comes under the "nicety" category and isn't really
   fixing a bug so we can try to figure out how to tidy up BCF later
   (plus VCF.gz is basically the universal format).
@daviesrob daviesrob merged commit 839a2e9 into samtools:develop Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants