Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up VCF<->ADAM pipeline #13

Merged
merged 1 commit into from
Dec 9, 2013
Merged

Cleaning up VCF<->ADAM pipeline #13

merged 1 commit into from
Dec 9, 2013

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Dec 4, 2013

Added internal ADAM variant context, that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added.

This code is not yet ready to get merged in (it needs more testing), but the changes are pretty big both internally and to the schema for the variant/genotype formats, so I want to get the code out there. I will send a post about the format out to the developer list tomorrow to start discussion. Once this code gets further test coverage, I will update and we can merge, providing the community agrees.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/52/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/53/

@fnothaft
Copy link
Member Author

fnothaft commented Dec 4, 2013

@massie, can you look into @AmplabJenkins? There seems to be a Jenkins-side issue that is causing builds to fail.

@fnothaft
Copy link
Member Author

fnothaft commented Dec 6, 2013

Calling @massie @tdanford @laserson to review.

@fnothaft
Copy link
Member Author

fnothaft commented Dec 6, 2013

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/68/

@tdanford
Copy link
Contributor

tdanford commented Dec 6, 2013

Frank, are you going to rebase into a single commit?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@fnothaft
Copy link
Member Author

fnothaft commented Dec 6, 2013

Sorry about that — I thought I had rebased down to 1 commit last night but apparently messed up and rebased down to 2! I've just fixed that — we are down to a single commit.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/70/


package edu.berkeley.cs.amplab.adam.commands

import edu.berkeley.cs.amplab.adam.util.{Args4j, Args4jBase}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please sort these imports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you like them sorted by?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be alphabetical order or anything like that.

Mainly, I'd like to see the imports grouped together. For example, keep the Spark imports together, the ADAM imports together, the hadoop imports together, etc...

Makes it easier to parse with the eyes when there's so many imports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem. I'll clean them up!

@AmplabJenkins
Copy link

Merged build triggered.

@fnothaft
Copy link
Member Author

fnothaft commented Dec 6, 2013

@tdanford @massie I've just pushed the changes to flatten the schema and to add documentation where necessary. Let me know if you all have any other thoughts.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/72/

@@ -126,6 +126,18 @@
<artifactId>picard</artifactId>
</dependency>
<dependency>
<groupId>cofoja</groupId>
<artifactId>cofoja</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to drop a file called NOTICE.txt into the root directory that reads.

The "Contracts for Java" (aka cofoja) package included in ADAM is released
under the LGPL v3 license.

All other code released under Apache 2 or compatible license.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been added.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/74/

* Save function for variant contexts. Disaggregates internal fields of variant context
* and saves to Parquet files.
*
* @param[in] filePath Master file path for parquet files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's lots of places where '[in]' are being added and breaking the docs. Maybe you need to use sed here.

@fnothaft
Copy link
Member Author

fnothaft commented Dec 7, 2013

I show all the param[in] cleaned now:

adam fnothaft$ grep "param[in]" -R adam-commands/src/main/scala/edu/berkeley/cs/amplab/adam/
adam fnothaft$ grep "param[in]" -R adam-commands/src/test/scala/edu/berkeley/cs/amplab/adam/
adam fnothaft$

Sorry about that - used to writing Doxygen, not java/scaladoc style docs.

@massie
Copy link
Member

massie commented Dec 7, 2013

This code looks good to me from a "how" perspective but I'll defer to you and @tdanford about the "what" and "why".

… that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added to do conversion back. A feature to generate variant data from genotype data was suggested and implemented, along with a command.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/77/

fnothaft added a commit that referenced this pull request Dec 9, 2013
@fnothaft fnothaft merged commit 5a943b9 into master Dec 9, 2013
@fnothaft fnothaft deleted the new-variant branch December 9, 2013 23:11
ryan-williams added a commit to ryan-williams/adam that referenced this pull request Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants