Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Sam Header Readgroup Barcode field #1210

Merged
merged 4 commits into from
Jan 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 35 additions & 6 deletions src/main/java/htsjdk/samtools/SAMReadGroupRecord.java
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,9 @@


import htsjdk.samtools.util.Iso8601Date;
import htsjdk.samtools.util.SamConstants;

import java.util.Arrays;
import java.util.Date;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.*;

/**
* Header information about a read group.
Expand All @@ -51,6 +48,8 @@ public class SAMReadGroupRecord extends AbstractSAMHeaderRecord
public static final String PLATFORM_MODEL_TAG = "PM";
public static final String PLATFORM_UNIT_TAG = "PU";
public static final String READ_GROUP_SAMPLE_TAG = "SM";
public static final String BARCODE_TAG = "BC";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be added to STANDARD_TAGS (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea



/* Platform values for the @RG-PL tag */
public enum PlatformValue {
Expand All @@ -63,7 +62,7 @@ public enum PlatformValue {
new HashSet<String>(Arrays.asList(READ_GROUP_ID_TAG, SEQUENCING_CENTER_TAG, DESCRIPTION_TAG,
DATE_RUN_PRODUCED_TAG, FLOW_ORDER_TAG, KEY_SEQUENCE_TAG, LIBRARY_TAG,
PROGRAM_GROUP_TAG, PREDICTED_MEDIAN_INSERT_SIZE_TAG, PLATFORM_TAG, PLATFORM_MODEL_TAG,
PLATFORM_UNIT_TAG, READ_GROUP_SAMPLE_TAG));
PLATFORM_UNIT_TAG, READ_GROUP_SAMPLE_TAG, BARCODE_TAG));

public SAMReadGroupRecord(final String id) { mReadGroupId = id; }

Expand All @@ -90,6 +89,36 @@ public SAMReadGroupRecord(final String id, final SAMReadGroupRecord srcProgramRe
public String getPlatform() { return getAttribute(PLATFORM_TAG); }
public void setPlatform(final String platform) { setAttribute(PLATFORM_TAG, platform); }

/**
* @return the List of barcodes associated with this read group or null
*/
public List<String> getBarcodes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a method using a different barcode separator too (in case it is needed by other implementations), and the one without params should use the default BARCODE_SEPARATOR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magicDGS
The spec recommends using - as the separator:

Barcode sequence identifying the sample or library. This value is the expected barcode bases
as read by the sequencing machine in the absence of errors. If there are several barcodes for
the sample/library (e.g., one on each end of the template), the recommended implementation
concatenates all the barcodes separating them with hyphens (‘-’).

My though was that we should support the recommended mode and not go out of the way to support other modes. You can always get around it by using getAttribute directly and parsing them yourself.

Are there implementations out there already? Is there a practical need for what you suggest or just a theoretical one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some FASTQ files with + as barcode separator, that when are converted with custom scripts to SAM/BAM the @RG tag (per-read) keeps that symbol; thus, some users of ReadTools have asked me to support other separators, which I did with custom code, but as a recommendation in the specs it looks like having a method supporting other implementations will be nice for the library.

Anyway, I am planning to come back in the real near future to htsjk3, where we can take the decisions about supporting the "recommended" specs alone or all of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding explicit support for non-standard barcode separators adds unnecessary complication. If you need to parse special ones you can always use getAttribute and setAttribute

final String barcodeString = getAttribute(BARCODE_TAG);
if (barcodeString == null) {
return null;
} else if (barcodeString.isEmpty()) {
return Collections.emptyList();
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else unnecessary after return

return Arrays.asList(barcodeString.split(SamConstants.BARCODE_SEQUENCE_DELIMITER));
}
}

/**
* Set the barcodes associated with this ReadGroup.
* Note that an input of null results in unsetting the attribute while an empty list is set as a tag with an empty value.
* @param barcodes a list of barcodes to associate with this read group
*/
public void setBarcodes(final List<String> barcodes) {
if (barcodes == null) {
setAttribute(BARCODE_TAG, null);
} else {
if (barcodes.stream().anyMatch(String::isEmpty)) {
throw new IllegalArgumentException("A barcode must not be an empty String");
}
setAttribute(BARCODE_TAG, String.join(SamConstants.BARCODE_SEQUENCE_DELIMITER, barcodes));
}
}

public Date getRunDate() {
final String dt = getAttribute(DATE_RUN_PRODUCED_TAG);
if (dt == null) return null;
Expand Down
13 changes: 10 additions & 3 deletions src/main/java/htsjdk/samtools/util/SamConstants.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,16 @@
*/
public final class SamConstants {

//No need to instantiate this class since all the onstants should be static
//No need to instantiate this class since all the constants should be static
private SamConstants(){};

final static String BARCODE_SEQUENCE_DELIMITER = "-";
final static String BARCODE_QUALITY_DELIMITER = " ";
/**
* The recommended separator to use when specifying multiple barcodes together in the same tag.
*/
public static final String BARCODE_SEQUENCE_DELIMITER = "-";

/**
* The recommend separator to use when specifying multiple barcode quality scores together in the same tag.
*/
public static final String BARCODE_QUALITY_DELIMITER = " ";
}
23 changes: 23 additions & 0 deletions src/test/java/htsjdk/samtools/SAMReadGroupRecordTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.util.Arrays;
import java.util.Collections;
import java.util.Date;
import java.util.List;
import java.util.function.BiConsumer;
import java.util.function.Function;

Expand Down Expand Up @@ -145,4 +148,24 @@ public void testEqualsAndHashcode(final SAMReadGroupRecord rg, final Object othe
}
}

@DataProvider
public Object[][] getBarcodes() {
return new Object[][] {
{null, null},
{Collections.emptyList(), ""},
{Collections.singletonList("aa"), "aa"},
{Arrays.asList("aa", "ac"), "aa-ac"},
{Arrays.asList("aa", "ca", "gg"), "aa-ca-gg"}
};
}

@Test(dataProvider = "getBarcodes")
public void testGetAndSetBarcodes(List<String> barcodes, String encoded){
final SAMReadGroupRecord readGroup = new SAMReadGroupRecord("ReadGroup");
Assert.assertNull(readGroup.getBarcodes());
Assert.assertNull(readGroup.getAttribute(SAMReadGroupRecord.BARCODE_TAG));
readGroup.setBarcodes(barcodes);
Assert.assertEquals(readGroup.getBarcodes(), barcodes);
Assert.assertEquals(readGroup.getAttribute(SAMReadGroupRecord.BARCODE_TAG), encoded);
}
}