You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The format of alleles in VCF is not formally defined in the specifications. In particular, the following edge cases exist in the current specifications: A*A* is a valid alt allele but should not. A<contig>A is unclear as to whether it is (or should be) valid.
I propose the following grammar productions as a formal definition for the format of alleles:
ref:
base-string
alt:
allele
allele , alt
allele:
base-string
missing-allele
symbolic-allele
missing-allele:
*
symbolic-allele:
id-string
symbolic-insertion
breakend
breakpoint
symbolic-insertion:
base id-string
id-string base
id-string
breakend:
base-string null-allele
null-allele base-string
breakpoint:
base-string [ breakpoint-reference [
[ breakpoint-reference [ base-string
base-string ] breakpoint-reference ]
] breakpoint-reference ] base-string
breakpoint-reference:
contig-reference
contig-reference : digits
contig-reference:
id-string
contig-identifier
digits:
digit
digits digit
digit: one of
0 1 2 3 4 5 6 7 8 9 0
base: one of
A C G T N a c g t n
base-string:
base
base-string base
id-string:
< contig-identifier >
contig-identifier:
string-containing-no-whitespace-or-colon
contig-identifier is a problematic definition. Currently the spec allows [ ] < > . * as part of contig identifiers. Inclusion of the brackets as valid characters are especially likely to cause difficulties with implementations as alleles such as N[<[>[>[ are currently valid, and the string is an ambiguous reference either to a (contig-identifier) reference contig "" or a (id-string) named contig CHR.
The text was updated successfully, but these errors were encountered:
The format of alleles in VCF is not formally defined in the specifications. In particular, the following edge cases exist in the current specifications:
A*A*
is a valid alt allele but should not.A<contig>A
is unclear as to whether it is (or should be) valid.I propose the following grammar productions as a formal definition for the format of alleles:
contig-identifier is a problematic definition. Currently the spec allows [ ] < > . * as part of contig identifiers. Inclusion of the brackets as valid characters are especially likely to cause difficulties with implementations as alleles such as N[<[>[>[ are currently valid, and the string is an ambiguous reference either to a (contig-identifier) reference contig "" or a (id-string) named contig CHR.
The text was updated successfully, but these errors were encountered: