-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning up VCF<->ADAM pipeline #13
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
One or more automated tests failed |
Merged build triggered. |
Merged build started. |
Merged build finished. |
One or more automated tests failed |
@massie, can you look into @AmplabJenkins? There seems to be a Jenkins-side issue that is causing builds to fail. |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Frank, are you going to rebase into a single commit? |
Merged build triggered. |
Merged build started. |
Sorry about that — I thought I had rebased down to 1 commit last night but apparently messed up and rebased down to 2! I've just fixed that — we are down to a single commit. |
Merged build finished. |
All automated tests passed. |
|
||
package edu.berkeley.cs.amplab.adam.commands | ||
|
||
import edu.berkeley.cs.amplab.adam.util.{Args4j, Args4jBase} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please sort these imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you like them sorted by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be alphabetical order or anything like that.
Mainly, I'd like to see the imports grouped together. For example, keep the Spark imports together, the ADAM imports together, the hadoop imports together, etc...
Makes it easier to parse with the eyes when there's so many imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem. I'll clean them up!
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
@@ -126,6 +126,18 @@ | |||
<artifactId>picard</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>cofoja</groupId> | |||
<artifactId>cofoja</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to drop a file called NOTICE.txt into the root directory that reads.
The "Contracts for Java" (aka cofoja) package included in ADAM is released
under the LGPL v3 license.
All other code released under Apache 2 or compatible license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has been added.
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
* Save function for variant contexts. Disaggregates internal fields of variant context | ||
* and saves to Parquet files. | ||
* | ||
* @param[in] filePath Master file path for parquet files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's lots of places where '[in]' are being added and breaking the docs. Maybe you need to use sed here.
I show all the param[in] cleaned now: adam fnothaft$ grep "param[in]" -R adam-commands/src/main/scala/edu/berkeley/cs/amplab/adam/ Sorry about that - used to writing Doxygen, not java/scaladoc style docs. |
This code looks good to me from a "how" perspective but I'll defer to you and @tdanford about the "what" and "why". |
… that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added to do conversion back. A feature to generate variant data from genotype data was suggested and implemented, along with a command.
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Cleaning up VCF<->ADAM pipeline
rm sparkTest wrapper
Added internal ADAM variant context, that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added.
This code is not yet ready to get merged in (it needs more testing), but the changes are pretty big both internally and to the schema for the variant/genotype formats, so I want to get the code out there. I will send a post about the format out to the developer list tomorrow to start discussion. Once this code gets further test coverage, I will update and we can merge, providing the community agrees.