-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions #470
Conversation
Can one of the admins verify this patch? |
Jenkins, add to whitelist. |
Test PASSed. |
Test PASSed. |
Thanks @erictu! This looks great. Can you squash down to a single commit? |
|
||
def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = { | ||
def overlapsQuery(rec: Genotype): Boolean = | ||
rec.getVariant.getContig.getContigName.toString == query.referenceName && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this chain of getters makes me wonder if we should protect against null values? Using Option()
and orNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general direction we're moving in (e.g., #469) is to require the API caller to provide input with valid mappings.
From a practical standpoint though, variants are meaningless without being anchored to a reference position. If you get to the point where you have variants without mappings attached to them, you've done something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, since you use the variant three times, you can define it once using...
val variant = Option(rec.getVariant)
And then use pattern matching or a simple conditional to skip genotypes without a variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genotypes without a variant are ill formed... IMO, we shouldn't check for that case. If we did want to validate that genotypes had a variant attached, it should only be via an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, variants are not nullable in the genotype schema:
record Genotype {
/**
The variant called at this site.
*/
Variant variant;
https://github.com/bigdatagenomics/bdg-formats/pull/39/files changes this, although it is arguable as to whether that is a desirable change (desirable because it makes everything nullable, undesirable because genotypes without variants are meaningless).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you ever want to project a Genotype without the variant? If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.
Since it isn't nullable now, no reason to do the check.
Would you ever want to project a Genotype without the variant?
Let's discuss this on the PR that makes the change. Conceivably, you could, but not sure if it makes sense to allow people to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
-Matt
On Sun, Nov 9, 2014 at 9:38 PM, Frank Austin Nothaft <
notifications@github.com> wrote:
In
adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:@@ -101,4 +102,12 @@ class GenotypeRDDFunctions(rdd: RDD[Genotype]) extends Serializable with Logging
bySample
}
+
- def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
- def overlapsQuery(rec: Genotype): Boolean =
rec.getVariant.getContig.getContigName.toString == query.referenceName &&
If so, then you need to make it nullable. If not, then we don't need to
worry about the null check here as you say.Since it isn't nullable now, no reason to do the check.
Would you ever want to project a Genotype without the variant?
Let's discuss this on the PR
https://github.com/bigdatagenomics/bdg-formats/pull/39/files that makes
the change. Conceivably, you could, but not sure if it makes sense to
allow people to.—
Reply to this email directly or view it on GitHub
https://github.com/bigdatagenomics/adam/pull/470/files#r20064839.
Test PASSed. |
If you squash this down to a single commit and rebase on |
added filterByOverlappingRegion for GenotypeRDDFunctions
Test PASSed. |
added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
Merged! Thanks @erictu! |
No description provided.