Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra low-level CRAM manipulation functions. #1771

Merged
merged 2 commits into from
May 3, 2024

Conversation

jkbonfield
Copy link
Contributor

Provide extra CRAM container manipulations and index queries.

Added to support extra functionality to samtools cat.

  • Some internal cram functions are no longer static as they're called
    from cram_external.c, but they don't have HTSLIB_EXPORT and aren't
    an official part of the API.
    These are cram_to_bam, cram_next_slice

  • New public CRAM APIs:
    These facilitate manipulation at the container level, both seeking to specific byte offsets, but also being able to specify containers as the n^th container listed in the index.

    cram_container_get_coords returns refid, start and span fields from the opaque cram_container struct.

    cram_filter_container copies a container but applies region based filtering, as already specified in the cram_fd with a range request. (Note we currently also provide cram_copy_slice, but may want to add a cram_copy_container for consistency.)

    cram_index_extents queries an index to return byte offsets of the first and last container overlapping a specified region.

    cram_num_containers_between queries an index to report the number of indexed containers and their container numbers (starting at 0 for the first) covering a range.

    cram_num_containers is a simplified cram_num_containers_between doing only the counting operation and on the entire file.

    cram_container_num2offset returns the byte offset for the n^th container. cram_container_offset2num does the reverse.

  • A new cram_skip_container function, which is currently internal only but may potentially have use externally in the future. It's used by cram_filter_container when it detects it'll filter out everything.

  • cram_index_query now copes with HTS_IDX_NOCOOR (-2) and maps it over to refid -1.

Also improved cram_index_query so it works on region HTS_IDX_NOCOOR too, rather than requiring a remapping to CRAM's -1.

cram_index_query_last does a loop on cram_index_query with the
previous index entry in "from".  This scans to find the last
container.  If we're doing a query of ref "*" however it comes in as
reference HTS_IDX_NOCOOR (-2) and fails the refid matching check.

This makes cram_index_query_last now work again for region "*".
Added to support extra functionality to `samtools cat`.

- Some internal cram functions are no longer static as they're called
  from cram_external.c, but they don't have HTSLIB_EXPORT and aren't
  an official part of the API.
  These are cram_to_bam, cram_next_slice

- New public CRAM APIs:
  These facilitate manipulation at the container level, both seeking
  to specific byte offsets, but also being able to specify containers
  as the n^th container listed in the index.

  cram_container_get_coords returns refid, start and span fields from
  the opaque cram_container struct.

  cram_filter_container copies a container but applies region based
  filtering, as already specified in the cram_fd with a range request.
  (Note we currently also provide cram_copy_slice, but may want to add
  a cram_copy_container for consistency.)

  cram_index_extents queries an index to return byte offsets of the
  first and last container overlapping a specified region.

  cram_num_containers_between queries an index to report the number of
  indexed containers and their container numbers (starting at 0 for
  the first) covering a range.

  cram_num_containers is a simplified cram_num_containers_between
  doing only the counting operation and on the entire file.

  cram_container_num2offset returns the byte offset for the n^th
  container.  cram_container_offset2num does the reverse.

- A new cram_skip_container function, which is currently internal only
  but may potentially have use externally in the future.  It's used by
  cram_filter_container when it detects it'll filter out everything.

- cram_index_query now copes with HTS_IDX_NOCOOR (-2) and maps it
  over to refid -1.
@whitwham whitwham merged commit 7576aca into samtools:develop May 3, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants