-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added TransientAttributes to GATKRead #5664
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5664 +/- ##
==============================================
+ Coverage 44.339% 86.838% +42.5%
- Complexity 19805 32325 +12520
==============================================
Files 1985 1991 +6
Lines 147675 149341 +1666
Branches 16235 16483 +248
==============================================
+ Hits 65477 129685 +64208
+ Misses 77055 13647 -63408
- Partials 5143 6009 +866
|
@jamesemery Once your ReferenceConfidence PR goes in, can you rebase this to remove the outdated/duplicated commit? |
dcbb452
to
6bc12eb
Compare
rebased the branch for your review @cmnbroad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good - a few mostly minor changes and tests requested.
@@ -406,10 +406,10 @@ private static int countOpticalDuplicates(OpticalDuplicateFinder finder, List<Pa | |||
metrics.updateMetrics(read); | |||
// NOTE: we use the SAMRecord transientAttribute field here specifically to prevent the already | |||
// serialized read from being parsed again here for performance reasons. | |||
if (((SAMRecordToGATKReadAdapter) read).getTransientAttribute(OPTICAL_DUPLICATE_TOTAL_ATTRIBUTE_NAME)!=null) { | |||
if (read.getTransientAttribute(OPTICAL_DUPLICATE_TOTAL_ATTRIBUTE_NAME)!=null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a corresponding setTransientAttribute
call in MarkDuplicatesSpark
that has a redundant SAMRecordToGATKReadAdapter
cast that can now be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I missed that.
@@ -667,6 +667,14 @@ default int numCigarElements(){ | |||
*/ | |||
byte[] getAttributeAsByteArray( final String attributeName ); | |||
|
|||
/** | |||
* This is used to access the transient attribute store in the underlying SAMRecord. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an interface, it should be prescriptive for implementers, and it definitely shouldn't mention SAMRecord
. The language should match that used in the set
method below, something like:
* This is used to access the transient attribute store in the underlying SAMRecord. | |
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record. |
* | ||
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead | ||
* @param key key under which the value will be stored | ||
* @param value value to be keyed upon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param value value to be keyed upon | |
* @param value value to stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still suggest this change ^^
* @param value value to be keyed upon | ||
*/ | ||
public void setTransientAttribute(final Object key, final Object value); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There really should be a clearTransientAttribute
method as well (SAMRecord has one so the GATK read adapter should pass that through).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a basic set/get/clear test in GATKReadAdaptersUnitTest
, and a roundtrip test in SAMRecordToGATKReadAdapterSerializerUnitTest to make sure we're really not serializing the attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (not I haven't added a clearTransientAttributes() call because it doesn't exist in the underlying SAMRecord code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery see SAMRecord.removeTransientAttribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad To clarify, I did not add a clearTransientAttribute_s_() method. There is no exposed sam record method for removing ALL the transient attributes at once like there is for normal attributes.
@cmnbroad Responded to your review comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor doc fixes, otherwise looks good. Feel free to merge when remaining comments are addressed.
@jamesemery - also note there was what appears to be a transient error on the last travis run, so we should make sure tests pass again before merging
@Override | ||
public void clearTransientAttribute( final String attributeName ) { | ||
clearCachedValues(); | ||
samRecord.setTransientAttribute(attributeName, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I was actually thinking we would call samRecord.removeTransientAttribute, but maybe this is more consistent with the rest of the GATKRead interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I was basing this off of our behavior for clearing regular attributes, i will change it to use the underlying samrecord method.
@@ -667,6 +667,14 @@ default int numCigarElements(){ | |||
*/ | |||
byte[] getAttributeAsByteArray( final String attributeName ); | |||
|
|||
/** | |||
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transient attributes should not -> Transient attributes will not
/** | ||
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record. | ||
* | ||
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this doc is on getTransientAttribute
, setAttribute
-> getAttribute
* | ||
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead | ||
* @param key key under which the value will be stored | ||
* @param value value to be keyed upon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still suggest this change ^^
Resolves #5653
Blocked by #5607