-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the star allele (*) considered symbolic or not? (a discussion about VC types) #151
Comments
Tumbleweed? Does no-one care or no-one has thought about this or no-one has a strong opinion one-way or another? I'll put in a spec-change PR and see if that will make more people chime in.. |
There is already a star alternate allele I raised an issue with the htsjdk allele class design a while back (see On Sat, Sep 24, 2016 at 1:57 PM, Yossi Farjoun notifications@github.com
|
The star allele The term "symbolic allele" refers primarily to anything enclosed in brackets Strings like |
The specification is currently vague about whether the use of "Options are base (sic) Strings made up of the bases A,C,G,T,N,*, ..." makes it seem like "The ‘*’ allele is reserved to indicate that the allele is missing due to an overlapping deletion." makes it seem like representation of spanning deletion should use In the case of an insertion or deletion that coincides with the edge of a spanning deletion, the requirement to add an anchor base would mean that either the boundary of the spanning deletion is being implicitly moved, or the anchor base must also be added to the spanning deletion allele. Similarly in the case of two partially overlapping deletions, you might want to add bases to each spanning deletion allele to indicate where the overlapping deletion stops. The alternative is to disallow mixing Note that the Octopus variant caller (from @dancooke) uses this "partial spanning deletion" notation currently. |
The VCF spec discusses symbolic alleles as
an angle-bracketed ID String “<ID>”
(in 1.6.1.4) but the overlapping deletion allele is*
. I suspect that the intention is that the star allele be considered a symbolic allele. The specific deletion which is overlapping can depend on the sample/genotype and thus cannot be said to be a specific allele which is simply not spelled out.In HTSJDK a VariantContext has a "type", as does an Allele. This isn't spelled out in the VCF spec and so I'm not sure if other VCF parsers do this as well (and if they do, whether it is using the same definitions...). The classification seems to be based on this.
Currently, since the star allele isn't considered symbolic, the VariantContext with it is considered a SNP (all the alleles are of length 1). I would like to change that but am concerned that there are issues that I haven't considered.
Since Allele type and Variation type are not specified in the VCF spec (as far as I could see), different implementations are thus free to do what they wish, but I suspect that we should decide as a community how to approach this so that we can agree on the meaning of basic things like "how many SNPs does a VCF have?"
The text was updated successfully, but these errors were encountered: