Skip to content

floats vs doubles in VCF BCF

pd3 edited this page Jan 9, 2014 · 1 revision

While the VCF file format does not specify the range of numeric types, BCF format allows maximum 32-bit integers and 32-bit floats. The general consensus is that 32-bit integers are sufficient but opinions about floats vary. The opinions expressed on the vcftools-spec mailing list were:

  1. 32-bit floats are sufficient, we can use log() when extended range is needed
  2. 32-bit floats are not sufficient, we must be able to express extended range explicitly (however, precision is sufficient)
  3. 32-bit floats are not sufficient, we need both extended range and precision

The following solutions have been proposed:

  1. restrict numeric types in VCF to 32-bits and recommend using log() when extended range is needed.
  2. dynamically detect the range and use doubles in BCF when necessary, similarly to how integer types are handled.
  3. introduce a new type 'double' to VCF and BCF

The following pros and cons have been mentioned:

  1. The need for higher precision seems only theoretical at this point, VCF producers such as GATK and samtools are happy with floats, no specific example was given to demonstrate the need for higher precision and extended range can be achieved using log().
  2. Allowing extended precision leads to a new sort of problems when converting from BCF to VCF: how many significant digits to output?
Clone this wiki locally