-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VCFFuncotation output order bug #6178
Conversation
@jonn-smith Handing this one off to you for now. I wasn't sure what to do about test exactly. |
@lbergelson The trouble with the large tests is that they can be automatically generated and have lots of output so it's hard to check the details. Do you have an example of where this was causing an error before? If so we can build a unit or integration test out of that variant to ensure this doesn't regress. If you can provide that kind of a locus I could go in and add in a test for it. |
I added a new test file with 1 line that exhibits the problem. If you run it with the old code the dbSnp annotations are shuffled. It's easy to see if you edit the output renderer to include the field name in the field value. |
I think a good test would be to add a new test vcf datasource with a number of fields, and the values of each field are just the field header. It's easy to assert that and obvious when there's an error. |
This PR also fixes the ordering issue when run under Java 11, see #6119 |
@@ -333,10 +335,13 @@ private static Funcotation mergeDuplicateFuncotationFactoryVariant(final Funcota | |||
final LinkedHashSet<String> allFieldNames = funcotation1.getFieldNames(); | |||
allFieldNames.addAll(funcotation2.getFieldNames()); | |||
|
|||
final Map<String, Object> mergedFieldsMap = allFieldNames.stream() | |||
.collect(Collectors.toMap(f -> f, f -> mergeFuncotationValue(f, funcotation1, funcotation2, VcfFuncotationFactory::renderFieldConflicts))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the Collectors.toMap()
overloaded method that takes a Supplier
and pass in LinkedHashMap::new
. The default uses HashMap::new
.
In this case, using streaming doesn't add much, but for longer chains of stream operations this would be a good change. Might be worth having a utility function in GATK to make it easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you could do that. You end up with a funny unused merge function if you do because this is assuming that you've pre-uniqued the inputs so I figured I'd just switch it to a for loop.
Fixes #6173 I added a single line test that shows the problem, and regenerated the other test files. I didn't write a specific unit test that proves it's solved. Feel free to do so if you think it's necessary. I validated the output by adding the field name to the output value and checking it by eye against the header lines. This could fairly simply made into a unit test if desired. I'm not sure if there are other large files that need to be regenerated. I had initially said this bug only affects vcf but I think it happens to Maf output as well.
98e7407
to
f62b704
Compare
@jonn-smith This should be good to go now. I added a non-regenerated integration test for the specific problem. |
@@ -696,6 +706,10 @@ public void exhaustiveArgumentTest(final String dataSourcesPath, | |||
arguments.addArgument(FuncotatorArgumentDefinitions.ANNOTATION_OVERRIDES_LONG_NAME, "Oreganno_Build:BUILDED_GOOD_REAL_BIG"); | |||
|
|||
// Run the beast: | |||
final File tmp = new File(System.getProperty("java.io.tmpdir")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um.. this wasn't meant to be here.. let me fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for adding in that test.
When the integration tests pass, feel free to merge
Fixes #6173
I added a single line test that shows the problem, and regenerated the other test files. I didn't write a specific unit test that proves it's solved. Feel free to do so if you think it's necessary.
I validated the output by adding the field name to the output value and checking it by eye against the header lines. This could fairly simply made into a unit test if desired.
I'm not sure if there are other large files that need to be regenerated.
I had initially said this bug only affects vcf but I think it happens to Maf output as well.
I removed a weakly typed method that allowed this bug to occur.