Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended API for symbolic alleles #18

Open
d-cameron opened this issue May 23, 2014 · 0 comments
Open

Extended API for symbolic alleles #18

d-cameron opened this issue May 23, 2014 · 0 comments

Comments

@d-cameron
Copy link
Contributor

Extending the htsjdk to parse symbolic alt alleles and return meaningful Alleles based on the parsed result would be useful to structural variation users of the API. In particular the following features would be useful:

  • separation of symbolic alleles into named structural variations, named contig insertions (both with sequence such as "A" and without ""), breakends and breakpoints
  • methods to expose the parsed breakpoint contig, position, directions
  • incorporation of all 31 SV headers into VCFStandardHeaderLines (or split out into VCFStructuralVariationHeaderLines)
  • methods to return the base call portions of symbolic alleles (ie: extract the ACGT from ACGT[chr1:1[ )

One possible design is to subclass Allele:
abstract Allele
Allele <- BaseCallAllele
Allele <- SymbolicAllele
Allele <- NoCallAllele
Allele <- MissingAllele
SymbolicAllele <- StructuralVariationAllele (with enum, subclass hierarchy or getters for SV type and subtypes (subsubtypes?) such as and DUP:TANDEM)
SymbolicAllele <- BreakendAllele
SymbolicAllele <- BreakpointAllele
SymbolicAllele <- NamedContigAllele

The low-level design and implementation depends on resolution of ambiguities in the VCF specifications as the are multiple points in which the VCF specifications are under-defined and ambiguous. For example, it is unclear as to whether "AA", "", "AA" are valid alleles.

@nh13 nh13 unassigned rpoplin Mar 6, 2015
@nh13 nh13 added the dev label Mar 6, 2015
@yfarjoun yfarjoun added this to the vcf 4.3 milestone Apr 1, 2019
vruano added a commit that referenced this issue May 18, 2019
More concretely it refactor the Allele class:

* Is now and interface.
* Added several methods to access properties from several allele types like breakends and whole contig insertions.
* Disentagles the current mixture of "display" bytes vs base bytes in the bases array... now only base are stored like that if it applies.
* Adds a "hidden" heriarchy of implementations for different allele subtypes.
* Improves reusing instances for common variants (single base and common symbolic alleles)
* Mark as deprecated several method that seem unecessary.
* Disable the possibility of changing an allele bases and adds an API (BaseSequence interface) to access these efficiently
  wihtout the need of clonning a byte[] array each time.
* Makes impossible to create alleles that have invalid base (IUPAC codes that are not allowed by the VCF spec).
* Fixes (and also deprecates) some not well-though Allele method that only make sense for some subtypes of alleles.
* Breakend API, access to contig, position type of breakend and base.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants