Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ADAMContext/GenomicRDD/pipe docs #1422

Merged
merged 3 commits into from
Mar 14, 2017

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 6, 2017

Resolves #1368. Also added latest citation information to the README.

@fnothaft fnothaft added this to the 0.22.0 milestone Mar 6, 2017
@coveralls
Copy link

coveralls commented Mar 6, 2017

Coverage Status

Coverage remained the same at 76.399% when pulling 8eddeb0 on fnothaft:issues/1368-pipe-markdown into 07c1982 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1836/
Test PASSed.

The `JavaADAMContext` class provides Java-friendly methods that are equivalent
to the `ADAMContext` methods. Specifically, these methods use Java types, and do
not make use of default parameters. In addition to the load/save methods
described above, the `ADAMContext` adds the implicit methods needed for using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly say that the Scala and Java labels in the above function calls indicate whether they can be used with JavaADAMContext/ADAMContext?

<dependency>
<groupId>org.bdgenomics.adam</groupId>
<artifactId>adam-core${binary.version}</artifactId>
<version>0.21.0</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer a ${version} or ${adam.version} variable here, so that the docs don't need to be updated on every new release

conversion from a `SparkContext` to an `ADAMContext`. To use this, import the
implicit, and call an `ADAMContext` method:

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the syntax highlighting language used in Github-flavored markdown cause problems with pandoc generation? If not, it would be useful to have ```scala here (and elsewhere below)


In Java, instantiate a JavaADAMContext, which wraps an ADAMContext:

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and ```java here


With an `ADAMContext`, you can load:

* Single reads as an `AlignmentRecordRDD`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be easier to read as a table?

With an `ADAMContext`, you can load:

* Single reads as an `AlignmentRecordRDD`:
* From SAM/BAM/CRAM using `loadBam` (Scala only)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something says (Scala only) here, does that mean the method is not usable from Java or is simply inconvenient to do so? I've been working with some of these from Java without issue.

functional [transformations]{#transforming} across the whole dataset.

Around an `RDD`, ADAM adds metadata which describes the genome, samples, or
pipeline that a dataset came from. Specifically, ADAM supports the following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't obvious which definition pipeline has here; this is about read groups?

We provide four implementations:

* `ADAMContext.sameTypeConversionFn`: For piped commands that do not change the
type of the `GenomicRDD` (e.g., `AlignmentRecordRDD` -> `AlignmentRecordRDD`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> → &rarr;

(this must be my favorite review comment ever)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// save to vcf
variantContexts.saveAsVcf("hdfs://mynamenode/my/variants.vcf")
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be useful to add a section/paragraph describing how data streamed from GenomicRDDs to external apps is post-partitioning; performance implications thereof. I see now this is mentioned above on line 483.

Copy link
Member

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a couple of nits for examples/further clarification.

Although `GenomicRDD`s do not extend Apache Spark's `RDD` class, `RDD`
operations can be performed on them using the `transform` method. Currently,
we only support `RDD` to `RDD` transformations that keep the same type as
the base type of the `GenomicRDD`. To apply an `RDD` transform, use the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add an example: i.e. We can transform AlignmentRecordRDDs to increment the position of the AlignmentRecord by 1 for every AlignmentRecord, but we cannot transform an AlignmentRecordRDD to a FeatureRDD.

I think it will make it much more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, sounds good, will add.

@fnothaft
Copy link
Member Author

Addressed reviewer comments and pulled changes in from #1384.

@coveralls
Copy link

coveralls commented Mar 10, 2017

Coverage Status

Coverage remained the same at 76.399% when pulling 3f1d348 on fnothaft:issues/1368-pipe-markdown into 07c1982 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1849/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Mar 10, 2017

Should I rebase and merge as is or did you want to squash some of the commits first?

@heuermh
Copy link
Member

heuermh commented Mar 10, 2017

Feel free to merge in commits from #1435 here and update 60_building_apps.md with something similar to the changes in commit 446ae81.

@fnothaft fnothaft force-pushed the issues/1368-pipe-markdown branch from 3f1d348 to 6cbecf2 Compare March 14, 2017 06:35
@fnothaft fnothaft force-pushed the issues/1368-pipe-markdown branch from 6cbecf2 to 6ec6749 Compare March 14, 2017 06:38
@fnothaft
Copy link
Member Author

Hi @heuermh! This should be good to go now. I figured that #1435 and this PR were sufficiently distinct that it made more sense to just merge #1435 into trunk separately. I wanted to pull #1384 into this since we were touching the same section of the document and would conflict otherwise. Anyways, I've got this squashed down as desired.

@coveralls
Copy link

coveralls commented Mar 14, 2017

Coverage Status

Coverage remained the same at 76.396% when pulling 6ec6749 on fnothaft:issues/1368-pipe-markdown into 57264bb on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1858/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1859/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1422/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 8933ec928ae71b6fd796e053ba91083c0b56f46a # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1422/merge^{commit} # timeout=10Checking out Revision 8933ec928ae71b6fd796e053ba91083c0b56f46a (origin/pr/1422/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 8933ec928ae71b6fd796e053ba91083c0b56f46aFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

Jenkins, retest this please.

@coveralls
Copy link

coveralls commented Mar 14, 2017

Coverage Status

Coverage remained the same at 76.396% when pulling 6ec6749 on fnothaft:issues/1368-pipe-markdown into 57264bb on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1861/
Test PASSed.

@heuermh heuermh merged commit 1cae769 into bigdatagenomics:master Mar 14, 2017
@heuermh
Copy link
Member

heuermh commented Mar 14, 2017

Thank you, @fnothaft @devin-petersohn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants