Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions #470

Merged
merged 1 commit into from
Nov 14, 2014

Conversation

erictu
Copy link
Member

@erictu erictu commented Nov 9, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

fnothaft commented Nov 9, 2014

Jenkins, add to whitelist.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/386/
Test PASSed.

@erictu erictu changed the title added ReferenceMapping for Genotype added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions Nov 9, 2014
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/387/
Test PASSed.

@fnothaft
Copy link
Member

Thanks @erictu! This looks great. Can you squash down to a single commit?


def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
def overlapsQuery(rec: Genotype): Boolean =
rec.getVariant.getContig.getContigName.toString == query.referenceName &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this chain of getters makes me wonder if we should protect against null values? Using Option() and orNull?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general direction we're moving in (e.g., #469) is to require the API caller to provide input with valid mappings.

From a practical standpoint though, variants are meaningless without being anchored to a reference position. If you get to the point where you have variants without mappings attached to them, you've done something wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, since you use the variant three times, you can define it once using...

val variant = Option(rec.getVariant)

And then use pattern matching or a simple conditional to skip genotypes without a variant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genotypes without a variant are ill formed... IMO, we shouldn't check for that case. If we did want to validate that genotypes had a variant attached, it should only be via an assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, variants are not nullable in the genotype schema:

record Genotype {
  /**
   The variant called at this site.
   */
  Variant variant;

https://github.com/bigdatagenomics/bdg-formats/pull/39/files changes this, although it is arguable as to whether that is a desirable change (desirable because it makes everything nullable, undesirable because genotypes without variants are meaningless).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you ever want to project a Genotype without the variant? If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR that makes the change. Conceivably, you could, but not sure if it makes sense to allow people to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

-Matt

On Sun, Nov 9, 2014 at 9:38 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

In
adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:

@@ -101,4 +102,12 @@ class GenotypeRDDFunctions(rdd: RDD[Genotype]) extends Serializable with Logging

 bySample

}
+

  • def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
  • def overlapsQuery(rec: Genotype): Boolean =
  •  rec.getVariant.getContig.getContigName.toString == query.referenceName &&
    

If so, then you need to make it nullable. If not, then we don't need to
worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR
https://github.com/bigdatagenomics/bdg-formats/pull/39/files that makes
the change. Conceivably, you could, but not sure if it makes sense to
allow people to.


Reply to this email directly or view it on GitHub
https://github.com/bigdatagenomics/adam/pull/470/files#r20064839.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/388/
Test PASSed.

@massie
Copy link
Member

massie commented Nov 12, 2014

If you squash this down to a single commit and rebase on master, I think we're ready to merge this.

added filterByOverlappingRegion for GenotypeRDDFunctions
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/401/
Test PASSed.

fnothaft added a commit that referenced this pull request Nov 14, 2014
added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
@fnothaft fnothaft merged commit 17cf66b into bigdatagenomics:master Nov 14, 2014
@fnothaft
Copy link
Member

Merged! Thanks @erictu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants