Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Sam Header Readgroup Barcode field #1210

Merged
merged 4 commits into from
Jan 29, 2019
Merged

Conversation

lbergelson
Copy link
Member

  • adding support in SAMReadGroupRecord for the BC attribute
  • this was added to the 1.6 spec

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@codecov-io
Copy link

codecov-io commented Oct 26, 2018

Codecov Report

Merging #1210 into master will increase coverage by 0.014%.
The diff coverage is 83.333%.

@@               Coverage Diff               @@
##              master     #1210       +/-   ##
===============================================
+ Coverage     69.358%   69.372%   +0.014%     
- Complexity      8302      8308        +6     
===============================================
  Files            555       555               
  Lines          33118     33130       +12     
  Branches        5572      5575        +3     
===============================================
+ Hits           22970     22983       +13     
+ Misses          7886      7885        -1     
  Partials        2262      2262
Impacted Files Coverage Δ Complexity Δ
.../main/java/htsjdk/samtools/SAMReadGroupRecord.java 94.286% <83.333%> (-2.266%) 48 <5> (+5)
...samtools/util/AsyncBlockCompressedInputStream.java 76% <0%> (+4%) 13% <0%> (+1%) ⬆️

Copy link
Member

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be nice to be able to configure the barcode separator somehow. Also, it might be possible to have a way to override the default statically, and like that an user can change it on startup to have a different implementation.

/**
* @return the List of barcodes associated with this read group or null
*/
public List<String> getBarcodes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a method using a different barcode separator too (in case it is needed by other implementations), and the one without params should use the default BARCODE_SEPARATOR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magicDGS
The spec recommends using - as the separator:

Barcode sequence identifying the sample or library. This value is the expected barcode bases
as read by the sequencing machine in the absence of errors. If there are several barcodes for
the sample/library (e.g., one on each end of the template), the recommended implementation
concatenates all the barcodes separating them with hyphens (‘-’).

My though was that we should support the recommended mode and not go out of the way to support other modes. You can always get around it by using getAttribute directly and parsing them yourself.

Are there implementations out there already? Is there a practical need for what you suggest or just a theoretical one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some FASTQ files with + as barcode separator, that when are converted with custom scripts to SAM/BAM the @RG tag (per-read) keeps that symbol; thus, some users of ReadTools have asked me to support other separators, which I did with custom code, but as a recommendation in the specs it looks like having a method supporting other implementations will be nice for the library.

Anyway, I am planning to come back in the real near future to htsjk3, where we can take the decisions about supporting the "recommended" specs alone or all of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding explicit support for non-standard barcode separators adds unnecessary complication. If you need to parse special ones you can always use getAttribute and setAttribute

/**
* The recommended separator for the {@link #BARCODE_TAG} when there are multiple bar codes associated with this read group.
*/
public static final String BARCODE_SEPARATOR = "-";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better name will be DEFAULT_BARCODE_SEPARATOR.

src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
return null;
} else if( barcodeString.isEmpty()){
return Collections.emptyList();
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else unnecessary after return

src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/SAMReadGroupRecord.java Outdated Show resolved Hide resolved
src/test/java/htsjdk/samtools/SAMReadGroupRecordTest.java Outdated Show resolved Hide resolved
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion and one comment about merging headers with RGs with barcodes. It looks like the header merger code merges RGs that have the same ID and identical attributes, but keeps them separate (by remapping the colliding ID) if they have the same ID but different attributes. So I think RGs that are identical except for the barcode separator would be kept separate, even if the barcode were identical, which is probably as good as we can do.

@@ -51,6 +47,13 @@
public static final String PLATFORM_MODEL_TAG = "PM";
public static final String PLATFORM_UNIT_TAG = "PU";
public static final String READ_GROUP_SAMPLE_TAG = "SM";
public static final String BARCODE_TAG = "BC";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be added to STANDARD_TAGS (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea

src/test/java/htsjdk/samtools/SAMReadGroupRecordTest.java Outdated Show resolved Hide resolved
/**
* @return the List of barcodes associated with this read group or null
*/
public List<String> getBarcodes() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding explicit support for non-standard barcode separators adds unnecessary complication. If you need to parse special ones you can always use getAttribute and setAttribute

@@ -51,6 +47,13 @@
public static final String PLATFORM_MODEL_TAG = "PM";
public static final String PLATFORM_UNIT_TAG = "PU";
public static final String READ_GROUP_SAMPLE_TAG = "SM";
public static final String BARCODE_TAG = "BC";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits. but fine otherwise

@yfarjoun
Copy link
Contributor

yfarjoun commented Dec 3, 2018

why the forced-push?

@yfarjoun yfarjoun added the Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond label Dec 3, 2018
* adding support in SAMReadGroupRecord for the BC attribute
* this was added to the 1.6 spec
@lbergelson
Copy link
Member Author

lbergelson commented Jan 24, 2019

@yfarjoun Can you take a quick look at the latest changes here? I realized the constants of yours I just merged weren't public so I made them as and added a javadoc.

Co-Authored-By: lbergelson <louisb@broadinstitute.org>
Co-Authored-By: lbergelson <louisb@broadinstitute.org>
@lbergelson lbergelson merged commit 5ef9223 into master Jan 29, 2019
@lbergelson lbergelson deleted the lb_RG_BC branch January 29, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants