Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exomiser VCF output includes whitespace in INFO field which are forbidden in VCF<4.3 #486

Closed
ielis opened this issue Mar 17, 2023 · 2 comments

Comments

@ielis
Copy link

ielis commented Mar 17, 2023

Hi, I think there may be a bug in VCF file that is produced by Exomiser.

Specifically, the EXOMISER_ACMG_DISEASE_NAME sub-field may include a value such as "Presynaptic congenital myasthenic syndromes". However, the disease name will frequently contain whitespace characters which are not allowed in VCF<4.3.

The section 1.4.1 (8) of the VCF4.2 specs forbids presence of whitespace characters.

INFO - additional information: (String, no whitespace, semicolons, or equals-signs permitted; commas are
permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short
keys with optional values in the format: =[,data]. ...

However, the restriction was apparently lifted in VCF 4.3:

INFO — additional information: Semicolon-separated series of additional information fields, or the MISS-
ING value ‘.’ if none are present.
...
Space characters are allowed in values.

@julesjacobsen
Copy link
Contributor

@ielis Damn. I wanted to write out VCF 4.3 just to be able to do this, but HTSJDK will only do 4.2 and I forgot to add underscores back into the disease name...

julesjacobsen added a commit that referenced this issue Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants