Fix cram_dependent_data_series when FN is used. #1314
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If we get into the primary loop of cram_decode_seq, iterating over feature (count FN) then we have unguarded code that always checks for sequence overlapping the reference. To do this, seq_pos must be set. Some data series we know how seq_pos is adjusted irrespective of whether we decode, eg BS is always +1, but others are strings and the only way we know how to update seq_pos is to decode them.
Hence added FN as having a dependency on SC, IN and BB.
Fixes samtools/samtools#1475
It's more questionable though in that bug report as to why FLAGS (BF data series) ever propogated to needing FN (cigar feature number). It's due to the
CRAM_CIGAR
check, but I'm unsure why it's there at present. It's obviously a bit overly cautious in this case. That said, it's simply a missing optimisation rather than an error.Edit: it's also harmless in all modern CRAMs. The cigar check means whenever we ask for BF we get RL added in for free, which is maybe wasted effort, but normally nothing for Illumina and one minor int data series for long read data. The only reason it expands so much on this data file is RL is stored in the CORE block, along with half a dozen other data series. This causes a dependency explosion, but really it's the file that is the most questionable here (it's likely old).