-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRAM: formalize Slice Type with an enum #1274
Conversation
@@ -135,7 +134,7 @@ public void processContainer(final Container container, final ValidationStringen | |||
AlignmentSpan unmappedSpan = refSet.remove(SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX); | |||
for (final int refId : new TreeSet<>(refSet.keySet())) { | |||
final AlignmentSpan span = refSet.get(refId); | |||
fakeSlice.sequenceId = refId; | |||
fakeSlice.setSequenceId(refId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also sets the SliceType, to ensure they stay in sync
Codecov Report
@@ Coverage Diff @@
## master #1274 +/- ##
===============================================
+ Coverage 67.728% 67.755% +0.027%
- Complexity 8225 8236 +11
===============================================
Files 560 562 +2
Lines 33493 33540 +47
Branches 5637 5638 +1
===============================================
+ Hits 22684 22725 +41
- Misses 8632 8633 +1
- Partials 2177 2182 +5
|
prevSeqId = Slice.MULTI_REFERENCE; | ||
break; | ||
default: | ||
if (prevSeqId != container.getSequenceId()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the prevSeqId < 0
check: since we know that the slice type is single-ref, we also know that the current sequence ID here is 0+. So it will differ from any negative values.
// as defined in the specs: | ||
public int sequenceId = -1; | ||
private int sequenceId = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't modify this without also modifying type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments/questions; haven't looked at the tests yet.
src/main/java/htsjdk/samtools/cram/structure/slice/SliceType.java
Outdated
Show resolved
Hide resolved
src/main/java/htsjdk/samtools/cram/structure/slice/SliceType.java
Outdated
Show resolved
Hide resolved
2311d9a
to
a888718
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments - see what you think and maybe we can discuss when we meet.
b558805
to
b51fb3d
Compare
switch (containerContext.getType()) { | ||
case UNMAPPED: | ||
referenceBases = new byte[]{}; | ||
prevSeqId = SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future candidate for turning into a ReferenceContext
@@ -208,7 +201,7 @@ protected boolean shouldFlushContainer(final SAMRecord nextRecord) { | |||
return true; | |||
} | |||
|
|||
if (refSeqIndex == Slice.MULTI_REFERENCE) { | |||
if (refSeqIndex == Slice.REF_ID_MULTIPLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refSeqIndex is a future candidate for ReferenceContext
@@ -59,60 +60,55 @@ public ContainerParser(final SAMFileHeader samFileHeader) { | |||
return records; | |||
} | |||
|
|||
public Map<Integer, AlignmentSpan> getReferences(final Container container, final ValidationStringency validationStringency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
someday we will merge a PR that collapses all these stupid things. I may even have such a thing open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: that's over at #1276
cramRecord.sequenceId = refIdCodec.readData(); | ||
} else { | ||
cramRecord.sequenceId = refId; | ||
cramRecord.sequenceId = refContext.getSequenceIdOrSentinel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watch out: can be unmapped here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps at some point CramRecord should have a ReferenceContext instead, or along with, a sequence ID.
1c87ef3
to
20f9eda
Compare
container.blockCount = 1; | ||
container.blocks = new Block[]{block}; | ||
container.landmarks = new int[0]; | ||
container.slices = new Slice[0]; | ||
container.alignmentSpan = Slice.NO_ALIGNMENT_SPAN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these fields are now initialized to these values
final byte[] peek = new byte[4]; | ||
try { | ||
int character = inputStream.read(); | ||
if (character == -1) | ||
return false; | ||
return readContainerHeader(new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was what the caller did when it received a false
from this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mildly not crazy about the recursion, but this change would make it directly recursive instead of indirectly, allow the otherwise unused method above to be deleted:
return readContainerHeader(new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); | |
return readContainerHeader(major, new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); |
@@ -56,13 +56,7 @@ private static Container readContainer(final int major, final InputStream inputS | |||
* @return a new {@link Container} object with container header values filled out but empty body (no slices and blocks). | |||
*/ | |||
public static Container readContainerHeader(final int major, final InputStream inputStream) { | |||
final Container container = new Container(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic moved inside ContainerHeaderIO.readContainerHeader()
20f9eda
to
d5b815f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a step in the right direction. Some comments and suggestions.
src/main/java/htsjdk/samtools/cram/structure/ContainerHeaderIO.java
Outdated
Show resolved
Hide resolved
final byte[] peek = new byte[4]; | ||
try { | ||
int character = inputStream.read(); | ||
if (character == -1) | ||
return false; | ||
return readContainerHeader(new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for another day: Its a bit confusing to have methods called readContainerHeader
and writeContainerHeader
that return/accept Container
s (but only partially populated ?), while Container
has a header
field, but the type is CramCompressionHeader
.
cramRecord.sequenceId = refIdCodec.readData(); | ||
} else { | ||
cramRecord.sequenceId = refId; | ||
cramRecord.sequenceId = refContext.getSequenceIdOrSentinel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps at some point CramRecord should have a ReferenceContext instead, or along with, a sequence ID.
final byte[] peek = new byte[4]; | ||
try { | ||
int character = inputStream.read(); | ||
if (character == -1) | ||
return false; | ||
return readContainerHeader(new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mildly not crazy about the recursion, but this change would make it directly recursive instead of indirectly, allow the otherwise unused method above to be deleted:
return readContainerHeader(new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); | |
return readContainerHeader(major, new ByteArrayInputStream((major >= 3 ? CramIO.ZERO_F_EOF_MARKER : CramIO.ZERO_B_EOF_MARKER))); |
9e286aa
to
c537ec9
Compare
- replace Slice.REF_ID_MULTIPLE/UNMAPPED with ReferenceContext constants - refs -> referenceBases - initializeFromRecords() - init Slice Aln Start and Span - don't return null - endPlusOne - rename getSequenceIdOrSentinel() to getSerializableId() and add toString() - rm unused readContainerHeader - test addiitonal Slice + Container fields
c537ec9
to
71f2729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from some naming comments, the code changes themselves look pretty good. Mostly requests to update some of the tests.
Also, as it stands, the ReferenceContextType enum isn't really used anywhere except tests. I'm not sure if thats because we haven't yet changed all the code to use ReferenceContexts, or if its because its redundant with ReferenceContext
. Seems harmless to leave it for now, just wondering if we still need it, or if ReferenceContext
makes it redundant.
src/main/java/htsjdk/samtools/cram/encoding/reader/MultiRefSliceAlignmentSpanReader.java
Outdated
Show resolved
Hide resolved
src/test/java/htsjdk/samtools/cram/build/ContainerFactoryTest.java
Outdated
Show resolved
Hide resolved
src/test/java/htsjdk/samtools/cram/ref/ReferenceContextTest.java
Outdated
Show resolved
Hide resolved
src/test/java/htsjdk/samtools/cram/ref/ReferenceContextTest.java
Outdated
Show resolved
Hide resolved
Co-Authored-By: jmthibault79 <thibault@broadinstitute.org>
53bf87f
to
2e2fa48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thx for the test changes. Still a couple of comments on additional code consolidation in the tests and test utils. If we don't resolve those as part of this PR, I'd request that we do so in the next PR, or even a separate PR specifically for that purpose.
Assert.assertTrue(slice.isMappedSingleRef()); | ||
} | ||
Assert.assertEquals(slice.nofRecords, expectedRecordCount); | ||
Assert.assertEquals(slice.bases, expectedBaseCount); | ||
} | ||
|
||
private CramCompressionRecord createRecord(final int index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This createRecord
method is nearly identical to createMappedRecord
in CramCompressionRecordUtil
. getUnplacedRecords
is (completely?) identical to 'getUnmappedRecordsin
CramCompressionRecordUtil`. etc.
Assert.assertEquals(container.sequenceId, slice.sequenceId); | ||
Assert.assertTrue(container.getReferenceContext().isMappedSingleRef()); | ||
Assert.assertEquals(container.getReferenceContext().getType(), ReferenceContextType.SINGLE_REFERENCE_TYPE); | ||
Assert.assertEquals(container.getReferenceContext(), slice.getReferenceContext()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the blocks of asserts in these tests could be replaced with one of the ContainerFactoryTest.assertContainerState
methods. Not sure if SliceTests.assertSliceState
could be reuseable as well.
Description
Add a ReferenceContext member to Slice and Container, based on the discussion at samtools/hts-specs#370 and the related spec update at samtools/hts-specs#372.
Checklist