Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TransientAttributes to GATKRead #5664

Merged
merged 4 commits into from
Apr 22, 2019

Conversation

jamesemery
Copy link
Collaborator

Resolves #5653

Blocked by #5607

@codecov-io
Copy link

codecov-io commented Feb 12, 2019

Codecov Report

Merging #5664 into master will increase coverage by 42.5%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##              master     #5664      +/-   ##
==============================================
+ Coverage     44.339%   86.838%   +42.5%     
- Complexity     19805     32325   +12520     
==============================================
  Files           1985      1991       +6     
  Lines         147675    149341    +1666     
  Branches       16235     16483     +248     
==============================================
+ Hits           65477    129685   +64208     
+ Misses         77055     13647   -63408     
- Partials        5143      6009     +866
Impacted Files Coverage Δ Complexity Δ
...broadinstitute/hellbender/utils/read/GATKRead.java 31.25% <ø> (+18.75%) 7 <0> (+3) ⬆️
.../SAMRecordToGATKReadAdapterSerializerUnitTest.java 100% <100%> (+94.118%) 4 <1> (+3) ⬆️
...kers/haplotypecaller/ReferenceConfidenceModel.java 92.952% <100%> (+2.643%) 86 <9> (+3) ⬆️
...transforms/markduplicates/MarkDuplicatesSpark.java 94.521% <100%> (ø) 36 <0> (ø) ⬇️
...forms/markduplicates/MarkDuplicatesSparkUtils.java 89.573% <100%> (+0.948%) 68 <0> (+1) ⬆️
...lbender/utils/read/SAMRecordToGATKReadAdapter.java 92.806% <100%> (+17.897%) 146 <1> (+38) ⬆️
...ools/walkers/contamination/ContaminationModel.java 87.5% <0%> (-4.891%) 39% <0%> (ø)
...er/tools/funcotator/FuncotatorIntegrationTest.java 84.052% <0%> (-2.075%) 114% <0%> (-2%)
...walkers/haplotypecaller/HaplotypeCallerEngine.java 78.547% <0%> (-0.074%) 74% <0%> (ø)
.../broadinstitute/hellbender/engine/LocusWalker.java 79.245% <0%> (ø) 15% <0%> (ø) ⬇️
... and 1089 more

@droazen droazen requested review from droazen and removed request for lbergelson March 15, 2019 17:15
@droazen droazen assigned droazen and unassigned lbergelson Mar 15, 2019
@droazen
Copy link
Contributor

droazen commented Mar 27, 2019

@jamesemery Once your ReferenceConfidence PR goes in, can you rebase this to remove the outdated/duplicated commit?

@jamesemery jamesemery force-pushed the je_addTransientAttributeFieldToGATKRead branch from dcbb452 to 6bc12eb Compare April 8, 2019 16:48
@jamesemery
Copy link
Collaborator Author

rebased the branch for your review @cmnbroad

@droazen droazen requested review from cmnbroad and removed request for droazen April 8, 2019 18:32
@droazen droazen assigned cmnbroad and unassigned droazen Apr 8, 2019
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - a few mostly minor changes and tests requested.

@@ -406,10 +406,10 @@ private static int countOpticalDuplicates(OpticalDuplicateFinder finder, List<Pa
metrics.updateMetrics(read);
// NOTE: we use the SAMRecord transientAttribute field here specifically to prevent the already
// serialized read from being parsed again here for performance reasons.
if (((SAMRecordToGATKReadAdapter) read).getTransientAttribute(OPTICAL_DUPLICATE_TOTAL_ATTRIBUTE_NAME)!=null) {
if (read.getTransientAttribute(OPTICAL_DUPLICATE_TOTAL_ATTRIBUTE_NAME)!=null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corresponding setTransientAttribute call in MarkDuplicatesSpark that has a redundant SAMRecordToGATKReadAdapter cast that can now be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I missed that.

@@ -667,6 +667,14 @@ default int numCigarElements(){
*/
byte[] getAttributeAsByteArray( final String attributeName );

/**
* This is used to access the transient attribute store in the underlying SAMRecord.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an interface, it should be prescriptive for implementers, and it definitely shouldn't mention SAMRecord. The language should match that used in the set method below, something like:

Suggested change
* This is used to access the transient attribute store in the underlying SAMRecord.
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record.

*
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead
* @param key key under which the value will be stored
* @param value value to be keyed upon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param value value to be keyed upon
* @param value value to stored

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still suggest this change ^^

* @param value value to be keyed upon
*/
public void setTransientAttribute(final Object key, final Object value);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There really should be a clearTransientAttribute method as well (SAMRecord has one so the GATK read adapter should pass that through).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a basic set/get/clear test in GATKReadAdaptersUnitTest, and a roundtrip test in SAMRecordToGATKReadAdapterSerializerUnitTest to make sure we're really not serializing the attributes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (not I haven't added a clearTransientAttributes() call because it doesn't exist in the underlying SAMRecord code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery see SAMRecord.removeTransientAttribute

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad To clarify, I did not add a clearTransientAttribute_s_() method. There is no exposed sam record method for removing ALL the transient attributes at once like there is for normal attributes.

@jamesemery
Copy link
Collaborator Author

@cmnbroad Responded to your review comments

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor doc fixes, otherwise looks good. Feel free to merge when remaining comments are addressed.

@jamesemery - also note there was what appears to be a transient error on the last travis run, so we should make sure tests pass again before merging

@Override
public void clearTransientAttribute( final String attributeName ) {
clearCachedValues();
samRecord.setTransientAttribute(attributeName, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I was actually thinking we would call samRecord.removeTransientAttribute, but maybe this is more consistent with the rest of the GATKRead interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I was basing this off of our behavior for clearing regular attributes, i will change it to use the underlying samrecord method.

@@ -667,6 +667,14 @@ default int numCigarElements(){
*/
byte[] getAttributeAsByteArray( final String attributeName );

/**
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient attributes should not -> Transient attributes will not

/**
* This is used to access a transient attribute store provided by the underlying implementation. Transient attributes should not be serialized or written out with a record.
*
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doc is on getTransientAttribute , setAttribute -> getAttribute

*
* NOTE: This is an advanced use case for GATKRead and you should probably use setAttribute() instead
* @param key key under which the value will be stored
* @param value value to be keyed upon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still suggest this change ^^

@droazen droazen assigned jamesemery and unassigned cmnbroad Apr 22, 2019
@jamesemery jamesemery merged commit aa19925 into master Apr 22, 2019
@jamesemery jamesemery deleted the je_addTransientAttributeFieldToGATKRead branch April 22, 2019 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add setTransientAttribute() to GATKRead
5 participants