-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added is* genotype methods from HTS-JDK Genotype to RichGenotype #895
Conversation
Can one of the admins verify this patch? |
Jenkins, test this please |
/** | ||
* True if all observed alleles are ref; if any alleles are no-calls, this method will return false. | ||
*/ | ||
def isHomRef: Boolean = { getType == GenotypeType.HOM_REF } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Professing some Scala ignorance here, is ==
ok for enum values across different JVMs and serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure , == is used in this Scala Enum documentation example http://www.scala-lang.org/api/current/index.html#scala.Enumeration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is.
Test FAILed. Build result: FAILUREGitHub pull request #895 of commit 45d68e8 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/895/merge^{commit} # timeout=10 > git branch -a --contains 5bc300a5fc823ad8fed4efbfc9d5cc4e7bdbb6c6 # timeout=10 > git rev-parse remotes/origin/pr/895/merge^{commit} # timeout=10Checking out Revision 5bc300a5fc823ad8fed4efbfc9d5cc4e7bdbb6c6 (origin/pr/895/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 5bc300a5fc823ad8fed4efbfc9d5cc4e7bdbb6c6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Build errors from Jenkins, not sure what the scalariform ones mean
|
Mhm I should have checked the GenotypeType Enum definition. HOM_VAR is renamed HOM_ALT in BDG. https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L486 And that is also what getType returns in the case of two equal alternative alleles. Checking for equality with I am also not sure what the scalariform error means. |
Code LGTM; will look at the Jenkins logs. |
Thank you @fnothaft . I noticed one other related thing, which might be included in this Pull request or I can open a ticket for it. The That means that for genotypes with an Shouldn't the diploid case for
With the above case for |
Hi @fnothaft . Also I believe that the It is currently limited to haploid or diploid: The HTS-JDK methods And also the biological concepts Except that heterozygous (mix of different alleles) is impossible in a haploid. I could take a shot at creating a getType method which should work for any ploidy level. In this pull request or another one. |
Out of respect for a former colleague, the term ploidy should be reserved for differences in number at the entire chromosome level. More than two observed alleles at a single position is better called something else, like copy number variation. Not sure why Jenkins hasn't retested this yet... |
Jenkins, retest this please |
Test FAILed. Build result: FAILUREGitHub pull request #895 of commit 1220f66 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/895/merge^{commit} # timeout=10 > git branch -a --contains 5fb6aae089b8ed669c626532b8e36a3282757d0e # timeout=10 > git rev-parse remotes/origin/pr/895/merge^{commit} # timeout=10Checking out Revision 5fb6aae089b8ed669c626532b8e36a3282757d0e (origin/pr/895/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 5fb6aae089b8ed669c626532b8e36a3282757d0eFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Hi Michael thank you for retesting. With ploidy I mean difference in number at the entire chromosome, like in some animals, plants and fungi is always the case for every chromosome. With a tetraploid organism for example a variant caller would always output 4 alleles for a genotype. For example these allele combinations which also have genotypTypes:
For the above genotypes from a tetraploid organism this assertion would fail But maybe this should go into another issue / pull request. To not have to many things at the same time going on. Don't know by the way what the alleles should be for
I guess HOM_ALT or HET depending on if the OtherAlt alleles are different of the same? But there is no way to find this out? |
Right, we're on the same page then. Looks like Jenkins is complaining about source code formatting
Could you try running that script and then squashing the commits in this pull request? It might be best to move the additional changes into a separate issue/pull request, since there may be other places where we aren't handling e.g. tetraploid organisms correctly. Thanks! |
Ok thank you for the tip. I tried to squash my commits. Looks like it worked. I will create an issue or a pull request for taking care of polyploid genotypes and genotypes containing OtherAlt in getType. |
That looks good, thanks. I wonder if I need to kick Jenkins again. @fnothaft do we need to add @NeillGibson to some ACL somewhere for Jenkins to auto-trigger on his pull requests? |
Hi @NeillGibson and @heuermh! Sorry for the slow reply.
I agree.
That would be great! I agree RE: your comments about the concepts being valid for all copy numbers.
Agreed.
We shouldn't need to. I'll kick Jenkins again though. The PR builder hook is a bit flaky. |
Jenkins, test this please. |
Test PASSed. |
@heuermh is this good to merge by you? |
+1 |
Merged. Thanks @NeillGibson! |
I added the is* genotype methods from HTS-JDK Genotype to Adam RichGenotype.
This makes it easier to filter on a genotypeRDD.
Compare for instance
genotypeRDD.filter(_.isHom)
to
genotypeRDD.filter(gt => (gt.getType = GenotypeType.HOM_REF) || (gt.getType = GenotypeType.HOM_ALT) )
These methods are copied almost exactly from from
https://github.com/samtools/htsjdk/blob/master/src/java/htsjdk/variant/variantcontext/Genotype.java#L230
I did not add
isMixed
andisAvailable
since Adam seems to handle these Genotypes the same as GenotypeType.NO_CALL