Cleaning up VCF<->ADAM pipeline #13

fnothaft · 2013-12-04T10:54:55Z

Added internal ADAM variant context, that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added.

This code is not yet ready to get merged in (it needs more testing), but the changes are pretty big both internally and to the schema for the variant/genotype formats, so I want to get the code out there. I will send a post about the format out to the developer list tomorrow to start discussion. Once this code gets further test coverage, I will update and we can merge, providing the community agrees.

AmplabJenkins · 2013-12-04T10:55:51Z

Merged build triggered.

AmplabJenkins · 2013-12-04T11:12:00Z

Merged build started.

AmplabJenkins · 2013-12-04T11:12:49Z

Merged build finished.

AmplabJenkins · 2013-12-04T11:12:50Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/52/

AmplabJenkins · 2013-12-04T20:40:52Z

Merged build triggered.

AmplabJenkins · 2013-12-04T20:40:52Z

Merged build started.

AmplabJenkins · 2013-12-04T20:41:40Z

Merged build finished.

AmplabJenkins · 2013-12-04T20:41:41Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/53/

fnothaft · 2013-12-04T20:49:46Z

@massie, can you look into @AmplabJenkins? There seems to be a Jenkins-side issue that is causing builds to fail.

fnothaft · 2013-12-06T11:54:34Z

Calling @massie @tdanford @laserson to review.

fnothaft · 2013-12-06T11:55:40Z

Jenkins, test this please.

AmplabJenkins · 2013-12-06T11:55:51Z

Merged build triggered.

AmplabJenkins · 2013-12-06T11:55:52Z

Merged build started.

AmplabJenkins · 2013-12-06T11:59:28Z

Merged build finished.

AmplabJenkins · 2013-12-06T11:59:29Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/68/

tdanford · 2013-12-06T15:39:26Z

Frank, are you going to rebase into a single commit?

AmplabJenkins · 2013-12-06T15:55:51Z

Merged build triggered.

AmplabJenkins · 2013-12-06T15:55:51Z

Merged build started.

fnothaft · 2013-12-06T15:56:04Z

Sorry about that — I thought I had rebased down to 1 commit last night but apparently messed up and rebased down to 2! I've just fixed that — we are down to a single commit.

AmplabJenkins · 2013-12-06T15:59:22Z

Merged build finished.

AmplabJenkins · 2013-12-06T15:59:22Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/70/

massie · 2013-12-06T16:52:32Z

adam-commands/src/main/scala/edu/berkeley/cs/amplab/adam/commands/Adam2Vcf.scala

+
+package edu.berkeley.cs.amplab.adam.commands
+
+import edu.berkeley.cs.amplab.adam.util.{Args4j, Args4jBase}


Can you please sort these imports?

What would you like them sorted by?

It doesn't need to be alphabetical order or anything like that.

Mainly, I'd like to see the imports grouped together. For example, keep the Spark imports together, the ADAM imports together, the hadoop imports together, etc...

Makes it easier to parse with the eyes when there's so many imports.

Sure, no problem. I'll clean them up!

AmplabJenkins · 2013-12-06T21:05:51Z

Merged build triggered.

fnothaft · 2013-12-06T21:06:51Z

@tdanford @massie I've just pushed the changes to flatten the schema and to add documentation where necessary. Let me know if you all have any other thoughts.

AmplabJenkins · 2013-12-06T21:51:42Z

Merged build started.

AmplabJenkins · 2013-12-06T21:55:12Z

Merged build finished.

AmplabJenkins · 2013-12-06T21:55:13Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/72/

massie · 2013-12-06T23:31:29Z

adam-commands/pom.xml

@@ -126,6 +126,18 @@
            <artifactId>picard</artifactId>
        </dependency>
        <dependency>
+            <groupId>cofoja</groupId>
+            <artifactId>cofoja</artifactId>


We need to drop a file called NOTICE.txt into the root directory that reads.

The "Contracts for Java" (aka cofoja) package included in ADAM is released under the LGPL v3 license. All other code released under Apache 2 or compatible license.

This file has been added.

AmplabJenkins · 2013-12-07T00:40:51Z

Merged build triggered.

AmplabJenkins · 2013-12-07T00:40:51Z

Merged build started.

AmplabJenkins · 2013-12-07T00:44:25Z

Merged build finished.

AmplabJenkins · 2013-12-07T00:44:26Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/74/

massie · 2013-12-07T00:44:43Z

adam-commands/src/main/scala/edu/berkeley/cs/amplab/adam/rdd/AdamRDDFunctions.scala

+   * Save function for variant contexts. Disaggregates internal fields of variant context
+   * and saves to Parquet files.
+   *
+   * @param[in] filePath Master file path for parquet files.


Actually, there's lots of places where '[in]' are being added and breaking the docs. Maybe you need to use sed here.

fnothaft · 2013-12-07T00:53:50Z

I show all the param[in] cleaned now:

adam fnothaft$ grep "param[in]" -R adam-commands/src/main/scala/edu/berkeley/cs/amplab/adam/
adam fnothaft$ grep "param[in]" -R adam-commands/src/test/scala/edu/berkeley/cs/amplab/adam/
adam fnothaft$

Sorry about that - used to writing Doxygen, not java/scaladoc style docs.

massie · 2013-12-07T01:07:46Z

This code looks good to me from a "how" perspective but I'll defer to you and @tdanford about the "what" and "why".

… that handles the unification of the core variant standard with additional variant/genotype annotations. This variant context also handles the unification of unnested variant and genotype info. In addition to this, the VCF<->ADAM conversion pipeline was upgraded to use Hadoop-BAM/GATK's VariantContext code, to eliminate text parsing. Also, an adam2vcf command was added to do conversion back. A feature to generate variant data from genotype data was suggested and implemented, along with a command.

AmplabJenkins · 2013-12-07T16:50:51Z

Merged build triggered.

AmplabJenkins · 2013-12-07T16:50:52Z

Merged build started.

AmplabJenkins · 2013-12-07T16:54:28Z

Merged build finished.

AmplabJenkins · 2013-12-07T16:54:29Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM/77/

Cleaning up VCF<->ADAM pipeline

rm sparkTest wrapper

massie reviewed Dec 6, 2013
View reviewed changes

massie reviewed Dec 7, 2013
View reviewed changes

fnothaft added a commit that referenced this pull request Dec 9, 2013

Merge pull request #13 from bigdatagenomics/new-variant

5a943b9

Cleaning up VCF<->ADAM pipeline

fnothaft merged commit 5a943b9 into master Dec 9, 2013

fnothaft deleted the new-variant branch December 9, 2013 23:11

ryan-williams added a commit to ryan-williams/adam that referenced this pull request Apr 4, 2017

Merge pull request bigdatagenomics#13 from ryan-williams/m

e4bee32

rm sparkTest wrapper


		package edu.berkeley.cs.amplab.adam.commands

		import edu.berkeley.cs.amplab.adam.util.{Args4j, Args4jBase}

Cleaning up VCF<->ADAM pipeline #13

Cleaning up VCF<->ADAM pipeline #13

Conversation

fnothaft commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

AmplabJenkins commented Dec 4, 2013

fnothaft commented Dec 4, 2013

fnothaft commented Dec 6, 2013

fnothaft commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

tdanford commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

fnothaft commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

massie Dec 6, 2013

Choose a reason for hiding this comment

fnothaft Dec 6, 2013

Choose a reason for hiding this comment

massie Dec 6, 2013

Choose a reason for hiding this comment

fnothaft Dec 6, 2013

Choose a reason for hiding this comment

AmplabJenkins commented Dec 6, 2013

fnothaft commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

AmplabJenkins commented Dec 6, 2013

massie Dec 6, 2013

Choose a reason for hiding this comment

fnothaft Dec 7, 2013

Choose a reason for hiding this comment

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

massie Dec 7, 2013

Choose a reason for hiding this comment

fnothaft commented Dec 7, 2013

massie commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013

AmplabJenkins commented Dec 7, 2013