You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extending the htsjdk to parse symbolic alt alleles and return meaningful Alleles based on the parsed result would be useful to structural variation users of the API. In particular the following features would be useful:
separation of symbolic alleles into named structural variations, named contig insertions (both with sequence such as "A" and without ""), breakends and breakpoints
methods to expose the parsed breakpoint contig, position, directions
incorporation of all 31 SV headers into VCFStandardHeaderLines (or split out into VCFStructuralVariationHeaderLines)
methods to return the base call portions of symbolic alleles (ie: extract the ACGT from ACGT[chr1:1[ )
One possible design is to subclass Allele:
abstract Allele
Allele <- BaseCallAllele
Allele <- SymbolicAllele
Allele <- NoCallAllele
Allele <- MissingAllele
SymbolicAllele <- StructuralVariationAllele (with enum, subclass hierarchy or getters for SV type and subtypes (subsubtypes?) such as and DUP:TANDEM)
SymbolicAllele <- BreakendAllele
SymbolicAllele <- BreakpointAllele
SymbolicAllele <- NamedContigAllele
The low-level design and implementation depends on resolution of ambiguities in the VCF specifications as the are multiple points in which the VCF specifications are under-defined and ambiguous. For example, it is unclear as to whether "AA", "", "AA" are valid alleles.
The text was updated successfully, but these errors were encountered:
More concretely it refactor the Allele class:
* Is now and interface.
* Added several methods to access properties from several allele types like breakends and whole contig insertions.
* Disentagles the current mixture of "display" bytes vs base bytes in the bases array... now only base are stored like that if it applies.
* Adds a "hidden" heriarchy of implementations for different allele subtypes.
* Improves reusing instances for common variants (single base and common symbolic alleles)
* Mark as deprecated several method that seem unecessary.
* Disable the possibility of changing an allele bases and adds an API (BaseSequence interface) to access these efficiently
wihtout the need of clonning a byte[] array each time.
* Makes impossible to create alleles that have invalid base (IUPAC codes that are not allowed by the VCF spec).
* Fixes (and also deprecates) some not well-though Allele method that only make sense for some subtypes of alleles.
* Breakend API, access to contig, position type of breakend and base.
Extending the htsjdk to parse symbolic alt alleles and return meaningful Alleles based on the parsed result would be useful to structural variation users of the API. In particular the following features would be useful:
One possible design is to subclass Allele:
abstract Allele
Allele <- BaseCallAllele
Allele <- SymbolicAllele
Allele <- NoCallAllele
Allele <- MissingAllele
SymbolicAllele <- StructuralVariationAllele (with enum, subclass hierarchy or getters for SV type and subtypes (subsubtypes?) such as and DUP:TANDEM)
SymbolicAllele <- BreakendAllele
SymbolicAllele <- BreakpointAllele
SymbolicAllele <- NamedContigAllele
The low-level design and implementation depends on resolution of ambiguities in the VCF specifications as the are multiple points in which the VCF specifications are under-defined and ambiguous. For example, it is unclear as to whether "AA", "", "AA" are valid alleles.
The text was updated successfully, but these errors were encountered: