Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make VariantContextWriterBuilder warn when indexing-on-the-fly is enabled for stream output #1328

Merged
merged 5 commits into from
Jun 17, 2019

Conversation

jamesemery
Copy link
Contributor

This is the first stage in resolving broadinstitute/gatk#5779. Currently in when we provide a stream (or any non-regular file to resolve) to the VariantContextWriterBuilder we set .setOutputFileType(OutputType.VCF). This sidesteps the code in the switch statement designed to this case and throw an exception if indexing on the fly is enabled. It turns out to be a pain point for the users because indexing on the fly is enabled by default. I propose that we push the responsibility of deciding if the output is a stream up to gatk, and then make the stream code in VariantContextWriterBuilder consistent with the BAM writer, which only warns and disables indexing if it suspects the file might be a stream.

@jamesemery
Copy link
Contributor Author

@cmnbroad

Copy link
Contributor

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, a few minor comments.

@@ -347,6 +347,25 @@ public void testIndexingOnTheFly() {
}
}

@Test
public void testIndexingOnTheFlyForPathStream() throws IOException {
final VariantContextWriterBuilder builder = new VariantContextWriterBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent should be 4 spaces, not 2. I realize the rest of the file is using 2, but IMO we should move to a common convention for all files, starting with new/updated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reindent the file pretty easily if you would like? It seems orthogonal to this branch however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just fix the lines that you're adding/modifying.

try (FileSystem fs = Jimfs.newFileSystem("test", Configuration.unix())) {
final Path vcfPath = IOUtil.getPath("/dev/null");
final Path vcfIdxPath = Tribble.indexPath(vcfPath);
// We explicitly setOutputFileType here to mimic gatk behavior where it pretends files that have no extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete sentence, or extra that?

@@ -107,6 +104,7 @@
public static final EnumSet<Options> DEFAULT_OPTIONS = EnumSet.of(Options.INDEX_ON_THE_FLY);
public static final EnumSet<Options> NO_OPTIONS = EnumSet.noneOf(Options.class);
private static final OpenOption[] EMPTY_OPEN_OPTION_ARRAY = new OpenOption[0];
private final static Log log = Log.getInstance(VariantContextWriter.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect modifier order

Suggested change
private final static Log log = Log.getInstance(VariantContextWriter.class);
private static final Log log = Log.getInstance(VariantContextWriter.class);

import htsjdk.samtools.util.IOUtil;
import htsjdk.samtools.util.Md5CalculatingOutputStream;
import htsjdk.samtools.util.RuntimeIOException;
import htsjdk.samtools.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to avoid wildcard imports as it makes it harder to see what the code is doing when you're not using an IDE. You can change the IDE settings so it doesn't create wildcard imports.

final Path vcfPath = IOUtil.getPath("/dev/null");
final Path vcfIdxPath = Tribble.indexPath(vcfPath);
// We explicitly setOutputFileType here to mimic gatk behavior where it pretends files that have no extension
try (final VariantContextWriter writer = builder.setOutputPath(vcfPath).setOutputFileType(OutputType.VCF_STREAM).build()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this also work if you do not set the output type to a stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it won't. In that case it will end up in the same place where we are as of right now, where an output stream location will end up "UNSPECIFIED" and throw an exception unless the user explicitly defines what type of output it is.

Copy link
Contributor

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@codecov-io
Copy link

codecov-io commented Jun 5, 2019

Codecov Report

Merging #1328 into master will increase coverage by 0.17%.
The diff coverage is 40%.

@@              Coverage Diff               @@
##              master     #1328      +/-   ##
==============================================
+ Coverage     67.822%   67.992%   +0.17%     
- Complexity      8258      8344      +86     
==============================================
  Files            562       570       +8     
  Lines          33663     33826     +163     
  Branches        5644      5655      +11     
==============================================
+ Hits           22831     22999     +168     
+ Misses          8658      8645      -13     
- Partials        2174      2182       +8
Impacted Files Coverage Δ Complexity Δ
...antcontext/writer/VariantContextWriterBuilder.java 80.625% <40%> (-2.067%) 59 <0> (-1)
...s/cram/encoding/external/ExternalLongEncoding.java 57.143% <0%> (-9.524%) 2% <0%> (-1%)
src/main/java/htsjdk/variant/bcf2/BCFVersion.java 88.889% <0%> (-5.848%) 11% <0%> (+4%)
...amtools/reference/FastaReferenceWriterBuilder.java 79.518% <0%> (-4.118%) 38% <0%> (+13%)
...java/htsjdk/samtools/cram/structure/Container.java 88.043% <0%> (-2.433%) 27% <0%> (+5%)
...c/main/java/htsjdk/samtools/util/IntervalList.java 72.18% <0%> (-2.329%) 64% <0%> (-11%)
...samtools/cram/build/CramSpanContainerIterator.java 71.053% <0%> (-2.118%) 9% <0%> (ø)
...tsjdk/variant/variantcontext/writer/VCFWriter.java 81.72% <0%> (-1.234%) 19% <0%> (+1%)
...jdk/samtools/cram/build/CramContainerIterator.java 88.889% <0%> (-1.111%) 10% <0%> (ø)
src/main/java/htsjdk/samtools/util/Interval.java 63.333% <0%> (-0.952%) 28% <0%> (+1%)
... and 51 more

@yfarjoun yfarjoun merged commit f13d075 into samtools:master Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants