-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make VariantContextWriterBuilder warn when indexing-on-the-fly is enabled for stream output #1328
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, a few minor comments.
@@ -347,6 +347,25 @@ public void testIndexingOnTheFly() { | |||
} | |||
} | |||
|
|||
@Test | |||
public void testIndexingOnTheFlyForPathStream() throws IOException { | |||
final VariantContextWriterBuilder builder = new VariantContextWriterBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent should be 4 spaces, not 2. I realize the rest of the file is using 2, but IMO we should move to a common convention for all files, starting with new/updated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reindent the file pretty easily if you would like? It seems orthogonal to this branch however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just fix the lines that you're adding/modifying.
try (FileSystem fs = Jimfs.newFileSystem("test", Configuration.unix())) { | ||
final Path vcfPath = IOUtil.getPath("/dev/null"); | ||
final Path vcfIdxPath = Tribble.indexPath(vcfPath); | ||
// We explicitly setOutputFileType here to mimic gatk behavior where it pretends files that have no extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete sentence, or extra that
?
@@ -107,6 +104,7 @@ | |||
public static final EnumSet<Options> DEFAULT_OPTIONS = EnumSet.of(Options.INDEX_ON_THE_FLY); | |||
public static final EnumSet<Options> NO_OPTIONS = EnumSet.noneOf(Options.class); | |||
private static final OpenOption[] EMPTY_OPEN_OPTION_ARRAY = new OpenOption[0]; | |||
private final static Log log = Log.getInstance(VariantContextWriter.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect modifier order
private final static Log log = Log.getInstance(VariantContextWriter.class); | |
private static final Log log = Log.getInstance(VariantContextWriter.class); |
src/main/java/htsjdk/variant/variantcontext/writer/VariantContextWriterBuilder.java
Show resolved
Hide resolved
import htsjdk.samtools.util.IOUtil; | ||
import htsjdk.samtools.util.Md5CalculatingOutputStream; | ||
import htsjdk.samtools.util.RuntimeIOException; | ||
import htsjdk.samtools.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's better to avoid wildcard imports as it makes it harder to see what the code is doing when you're not using an IDE. You can change the IDE settings so it doesn't create wildcard imports.
final Path vcfPath = IOUtil.getPath("/dev/null"); | ||
final Path vcfIdxPath = Tribble.indexPath(vcfPath); | ||
// We explicitly setOutputFileType here to mimic gatk behavior where it pretends files that have no extension | ||
try (final VariantContextWriter writer = builder.setOutputPath(vcfPath).setOutputFileType(OutputType.VCF_STREAM).build()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this also work if you do not set the output type to a stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it won't. In that case it will end up in the same place where we are as of right now, where an output stream location will end up "UNSPECIFIED" and throw an exception unless the user explicitly defines what type of output it is.
77438b3
to
bd5ff18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Codecov Report
@@ Coverage Diff @@
## master #1328 +/- ##
==============================================
+ Coverage 67.822% 67.992% +0.17%
- Complexity 8258 8344 +86
==============================================
Files 562 570 +8
Lines 33663 33826 +163
Branches 5644 5655 +11
==============================================
+ Hits 22831 22999 +168
+ Misses 8658 8645 -13
- Partials 2174 2182 +8
|
This is the first stage in resolving broadinstitute/gatk#5779. Currently in when we provide a stream (or any non-regular file to resolve) to the
VariantContextWriterBuilder
we set.setOutputFileType(OutputType.VCF)
. This sidesteps the code in the switch statement designed to this case and throw an exception if indexing on the fly is enabled. It turns out to be a pain point for the users because indexing on the fly is enabled by default. I propose that we push the responsibility of deciding if the output is a stream up to gatk, and then make the stream code inVariantContextWriterBuilder
consistent with the BAM writer, which only warns and disables indexing if it suspects the file might be a stream.