-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Sam Header Readgroup Barcode field #1210
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1210 +/- ##
===============================================
+ Coverage 69.358% 69.372% +0.014%
- Complexity 8302 8308 +6
===============================================
Files 555 555
Lines 33118 33130 +12
Branches 5572 5575 +3
===============================================
+ Hits 22970 22983 +13
+ Misses 7886 7885 -1
Partials 2262 2262
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to be able to configure the barcode separator somehow. Also, it might be possible to have a way to override the default statically, and like that an user can change it on startup to have a different implementation.
/** | ||
* @return the List of barcodes associated with this read group or null | ||
*/ | ||
public List<String> getBarcodes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a method using a different barcode separator too (in case it is needed by other implementations), and the one without params should use the default BARCODE_SEPARATOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magicDGS
The spec recommends using -
as the separator:
Barcode sequence identifying the sample or library. This value is the expected barcode bases
as read by the sequencing machine in the absence of errors. If there are several barcodes for
the sample/library (e.g., one on each end of the template), the recommended implementation
concatenates all the barcodes separating them with hyphens (‘-’).
My though was that we should support the recommended mode and not go out of the way to support other modes. You can always get around it by using getAttribute
directly and parsing them yourself.
Are there implementations out there already? Is there a practical need for what you suggest or just a theoretical one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some FASTQ files with +
as barcode separator, that when are converted with custom scripts to SAM/BAM the @RG
tag (per-read) keeps that symbol; thus, some users of ReadTools have asked me to support other separators, which I did with custom code, but as a recommendation in the specs it looks like having a method supporting other implementations will be nice for the library.
Anyway, I am planning to come back in the real near future to htsjk3, where we can take the decisions about supporting the "recommended" specs alone or all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding explicit support for non-standard barcode separators adds unnecessary complication. If you need to parse special ones you can always use getAttribute
and setAttribute
/** | ||
* The recommended separator for the {@link #BARCODE_TAG} when there are multiple bar codes associated with this read group. | ||
*/ | ||
public static final String BARCODE_SEPARATOR = "-"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better name will be DEFAULT_BARCODE_SEPARATOR
.
return null; | ||
} else if( barcodeString.isEmpty()){ | ||
return Collections.emptyList(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
unnecessary after return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion and one comment about merging headers with RGs with barcodes. It looks like the header merger code merges RGs that have the same ID and identical attributes, but keeps them separate (by remapping the colliding ID) if they have the same ID but different attributes. So I think RGs that are identical except for the barcode separator
would be kept separate, even if the barcode were identical, which is probably as good as we can do.
@@ -51,6 +47,13 @@ | |||
public static final String PLATFORM_MODEL_TAG = "PM"; | |||
public static final String PLATFORM_UNIT_TAG = "PU"; | |||
public static final String READ_GROUP_SAMPLE_TAG = "SM"; | |||
public static final String BARCODE_TAG = "BC"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be added to STANDARD_TAGS
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good idea
/** | ||
* @return the List of barcodes associated with this read group or null | ||
*/ | ||
public List<String> getBarcodes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding explicit support for non-standard barcode separators adds unnecessary complication. If you need to parse special ones you can always use getAttribute
and setAttribute
@@ -51,6 +47,13 @@ | |||
public static final String PLATFORM_MODEL_TAG = "PM"; | |||
public static final String PLATFORM_UNIT_TAG = "PU"; | |||
public static final String READ_GROUP_SAMPLE_TAG = "SM"; | |||
public static final String BARCODE_TAG = "BC"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits. but fine otherwise
why the forced-push? |
* adding support in SAMReadGroupRecord for the BC attribute * this was added to the 1.6 spec
@yfarjoun Can you take a quick look at the latest changes here? I realized the constants of yours I just merged weren't public so I made them as and added a javadoc. |
Co-Authored-By: lbergelson <louisb@broadinstitute.org>
Co-Authored-By: lbergelson <louisb@broadinstitute.org>
Checklist