Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several changes to metrics collections (AlignmentSummaryMetrics and WgsMetrics) + some fluff #1555

Merged
merged 12 commits into from
Aug 5, 2020

Conversation

yfarjoun
Copy link
Contributor

  • Added Clips (Soft and Hard) to AlignmentSummaryMetrics
  • Added read-length histogram and pdf output to AlignmentSummaryMetrics
  • Added Tests
  • Added a TEMP_OUTPUT_DIR to CommandLineProgramTest.java which is deleted once the derived class is done testing
  • Modified Two test classes to use TEMP_OUTPUT_DIR
  • Modified enum in CollectMultipleMetrics so that it will display what the options are and what each program does in the help.

Copy link
Contributor

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet but wanted to submit my comments so far

TEMP_OUTPUT_DIR.delete();
TEMP_OUTPUT_DIR.mkdir();
} catch (IOException e) {
throw new PicardException("Couldn't create temp directory") ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not use RuntimeException() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well...PicardException is a RuntimeException...but this was it's more clear that it comes from Picard...I don't mind either way....what's the argument for RuntimeException?

@@ -15,6 +20,24 @@
public static final File CHR_M_REFERENCE = new File(REFERENCE_TEST_DIR,"chrM.reference.fasta");
public static final File CHR_M_DICT = new File(REFERENCE_TEST_DIR,"chrM.reference.dict");


// A per-test-class directory that will be deleted after the tests are complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, an improvement would be to make this lazy, as not every test needs to create temp output. E.g.,

protected File getTempDir() {
    if (TEMP_OUTPUT_DIR == null) {
    ...
}

final protected File TEMP_OUTPUT_DIR;
{
try {
TEMP_OUTPUT_DIR = File.createTempFile(FileUtils.getTempDirectory().getAbsolutePath(),this.getClass().getSimpleName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

TEMP_OUTPUT_DIR = Files.createTempDirectory(getClass().getName()).toFile();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -51,8 +55,7 @@ public String getCommandLineProgramName() {
public void test() throws IOException {
final File input = new File(TEST_DATA_DIR, "summary_alignment_stats_test.sam");
final File reference = new File(TEST_DATA_DIR, "summary_alignment_stats_test.fasta");
final File outfile = File.createTempFile("alignmentMetrics", ".txt");
outfile.deleteOnExit();
final File outfile = File.createTempFile("test", ".txt", TEMP_OUTPUT_DIR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is a common case. You could add createTempFile to CommandLineProgramTest to avoid some typing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not much less typing...but OK.

@DataProvider()
Object[][] TrueFalse() {
return new Object[][]{
new Object[]{true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary array new: {{true}, {false}} should work here

@Test(dataProvider = "wgsDataProvider")
public void testSmallIntervals(final File input, final String reference_name,
final String useFastAlgorithm) throws IOException {
final File outfile = File.createTempFile("testSmallIntervals", ".wgs_metrics",TEMP_OUTPUT_DIR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
final File outfile = File.createTempFile("testSmallIntervals", ".wgs_metrics",TEMP_OUTPUT_DIR);
final File outfile = File.createTempFile("testSmallIntervals", ".wgs_metrics", TEMP_OUTPUT_DIR);


setBuilder.setReadLength(100);

setBuilder.addPair("all_in",0,200,200,false,false,"100M","100M",true,false,30);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
setBuilder.addPair("all_in",0,200,200,false,false,"100M","100M",true,false,30);
setBuilder.addPair("all_in", 0, 200, 200, false, false, "100M", "100M", true, false, 30);

final File outfile = File.createTempFile("testIntervalOneRead", ".wgs_metrics", TEMP_OUTPUT_DIR);
final File intervals = new File(TEST_DIR, "smallIntervals.interval_list");
final int sampleSize = 1000;
final String[] args = new String[]{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary new

Suggested change
final String[] args = new String[]{
final String[] args = {

@@ -12,7 +12,7 @@

public class CollectOxoGMetricsTest {
private static final File TEST_DATA_DIR = new File("testdata/picard/sam/");
private static final File SAM_FILE = new File(TEST_DATA_DIR, "summary_alignment_stats_test.sam");
private static final File SAM_FILE = new File(CollectAlignmentSummaryMetricsTest.TEST_DATA_DIR, "summary_alignment_stats_test.sam");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could lead to problems, if someone changed the file while editing the other test but didn't realize the file was used here as well. If you made a separate constant for each file shared like this (e.g., build the path to the file in CollectAlignmentSummaryMetricsTest as a constant), that would at least make it easier to find other tests that depended on that file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no different than what it was (regarding someone changing the file) but at least now there's a variable that is traced to where it's being used...I fail to see how building the path from scratch helps...I moved the files into a subdirectory and suddenly other tests started failing....with this construction it will not happen...


// Get Theoretical Het SNP Sensitivity
if (unfilteredBaseQHistogram != null && unfilteredDepthHistogram != null) {
if (unfilteredBaseQHistogram != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this null check? If this is null, it will cause TheoreticalSensitivity.normalizeHistogram to throw a PicardException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already checked in the first line of test

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many minor comments, looks OK as far as I can tell otherwise.

* Class that counts reads that match various conditions
*/
public class IndividualAlignmentSummaryMetricsCollector implements PerUnitMetricCollector<AlignmentSummaryMetrics, Integer, SAMRecordAndReference> {
private long numPositiveStrand = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 0 initializers unnecessary

private final Histogram<Integer> readLengthHistogram = new Histogram<>("count", "readLength");
private final Histogram<Integer> alignedReadLengthHistogram = new Histogram<>("count", "alignedReadLength");

private AlignmentSummaryMetrics metrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final, I think

metrics.PCT_READS_ALIGNED_IN_PAIRS = MathUtil.divide((double) metrics.READS_ALIGNED_IN_PAIRS, (double) metrics.PF_READS_ALIGNED);
metrics.PCT_PF_READS_IMPROPER_PAIRS = MathUtil.divide((double) metrics.PF_READS_IMPROPER_PAIRS, (double) metrics.PF_READS_ALIGNED);
metrics.STRAND_BALANCE = MathUtil.divide(numPositiveStrand, (double) metrics.PF_READS_ALIGNED);
metrics.PCT_CHIMERAS = MathUtil.divide(this.chimeras, (double) this.chimerasDenominator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't normally comment on this. but the code is inconsistent in its use. Why do these use this. but no other fields do? I think it would be better to omit them completely unless necessary for disambiguation purposes.

// Generally programs should not be accessing these directly but it might make things smoother
// to just set them anyway. These are set here to make sure that in case of a the derived class
// overrides
program.INPUT = input;

return program;
}
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary ;

Suggested change
};
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

final String METRICS_EXTENSION = ".insert_size_metrics";
final String PDF_EXTENSION = ".insert_size_histogram.pdf";

final List<String> OUTPUT_EXTENSIONS = CollectionUtil.makeList(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why makeList instead of Arrays.asList?

Suggested change
final List<String> OUTPUT_EXTENSIONS = CollectionUtil.makeList(
final List<String> OUTPUT_EXTENSIONS = Arrays.asList(

PDF_EXTENSION);

@Override
public String getHelpDoc() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the same for most (all?) subclasses. If you make USAGE_SUMMARY and OUTPUT_EXTENSIONS either constructor arguments or abstract methods then the duplicate code can be removed.


@Override
public String getHelpDoc() {
return picard.analysis.CollectInsertSizeMetrics.USAGE_SUMMARY +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be CollectAlignmentSummaryMetrics.USAGE_SUMMARY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. good catch.

Yossi Farjoun added 12 commits August 4, 2020 16:30
- Added Clips (Soft and Hard) to AlignmentSummaryMetrics
- Added read-length histogram and pdf output to AlignmentSummaryMetrics
- Added Tests
- Added a TEMP_OUTPUT_DIR to CommandLineProgramTest.java which is delete once the derived class is done testing
- Modified Two test classes to use TEMP_OUTPUT_DIR
- Modified enum in CollectMultipleMetrics so that it will display what the options are and what each program does in the help.
- fixed fialing test (protect against empty histogram)
- fixing other tests failures
@yfarjoun yfarjoun force-pushed the yf_add_fold_penalty_to_wgs_metrics branch from d0f7b67 to d1d1754 Compare August 4, 2020 20:31
@yfarjoun
Copy link
Contributor Author

yfarjoun commented Aug 4, 2020

@pshapiro4broad thanks for the comments. I responded to (almost) all of them. please take another glance.

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Aug 4, 2020

oh, you approved already...nm then. I'll wait till tests pass and merge.

@yfarjoun yfarjoun merged commit bd603aa into master Aug 5, 2020
@yfarjoun yfarjoun deleted the yf_add_fold_penalty_to_wgs_metrics branch August 5, 2020 02:13
@yfarjoun yfarjoun restored the yf_add_fold_penalty_to_wgs_metrics branch August 11, 2020 16:23
@yfarjoun yfarjoun deleted the yf_add_fold_penalty_to_wgs_metrics branch December 22, 2020 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants