-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools norm NaN > nan #755
Comments
This could be probably fixed by using I am not sure if the fix shouldn't be on GATK side though. There was another case when GATK was refusing to parse |
We discussed this internally and the consensus is that this should be fixed on GATK side. I am happy to accept pull requests enabling to work around this problem similar to this https://github.com/samtools/bcftools/blob/develop/misc/fix-broken-GATK-Double-vs-Integer |
@pd3 While the 0 vs 0.0 is admittedly a dumb gatk issue,
|
C's The VCF spec actually says It seems to me that the sensible way forward is for the spec to be relaxed to allow any mixture of case and either Java input code will need special case code to input otherwise-cased NaNs and infinities — but that's inevitable given Double read_a_double(str) {
try {
return Double.valueOf(str);
}
catch (NumberFormatException) {
str = str.toLower();
if (str == "nan") return Double.NaN;
else if … // etc for "inf" and "infinity"
else rethrow;
}
} |
@jmarshall I take your point. It seems like this should be nailed down in some floating point RFC, but if C stdlib produces a mix of case and inf/infinity than I guess there isn't much hope for a standardized naming scheme... The reason this is an issue at all is that these failures are happening in library code where there isn't an easy way to change the parsing function. We can figure something out though... It always feels to me that using JEXL for specifying vcf filter expressions causes more problems than it's solved for us... |
Having now looked at the stack trace in the linked issue, I see this is being parsed in some other non-bioinformatics library that's not HTSJDK. Oh… bad luck 😢 |
bcftools norm seems to be converting NaN values (specfically MQ=NaN) to nan. This is causing an error during downstream processing of the normalised VCF with GATK.
Initial problem was with version 1.2.1 (version installed on our HPC system), but I've confirmed it still happens in the latest 1.7
The text was updated successfully, but these errors were encountered: