Skip to content
Daniel Cameron edited this page May 21, 2019 · 3 revisions

Calculating variant allele fraction (VAF)

Calculating the VAF for a SV had traditionally been done by counting the split reads and read pairs (SR + RP) supporting the variant. This is problematic for the following reasons:

  • Read pairs that are discordantly mapped and contain a split read will be counted twice.
  • Reads that overlap the breakpoint but are aligned as soft-clipped reads will not be counted.

To resolve this issues, The gridss reports a supporting fragment count in the VF field. A fragment can support a variant either directly through split read, soft clipped read or discordant alignment of a read pair, indirectly through incorporation of one or both of the constituent reads in an assembly supporting the variant, or both directly and indirectly. The VF field reported by GRIDSS is a count of the unique fragments supporting the variant.

The REF and REFPAIR fields count the number of reads and read pairs that support the reference. Note that these fields are disjoint and read pair is only considered to support the reference if the spans the putative breakend, with each reads aligned to opposite side of the putative breakend position.

To calculate the VAF, the following formula should be used:

VAF = VF / (VF + REF + REFPAIR)

For events less than the library fragment size distribution, read pairs originating from the chromatid containing the SV will fall within the fragment size distribution thus contribute to the REFPAIR not RP. For this reason, read pairs should be ignored when calculating the VAF of small events:

VAF (small event) = VF / (VF + REF)

Each breakpoint has 2 VAFs

VAF is calculated per breakend. A breakpoint can be heterozygous at one breakend but homozyous at the other (e.g. a chr1-chr2 SV with a CN loss on the other chr1 chromatid).

REF and REFPAIR breakpoint homology handling

REF/REFPAIR are defined in terms of reads/read pairs spanning the putative breakend. This definition is subtly different to read depth: read depth counts the reads mapping to a given base, by REF/REFPAIR count reads that map to both sides of the putative breakend.

When a breakpoint has homology, the true position of the break cannot be determined. In such scenarios, read aligners will systematically align half the reads to each homology. In such cases, the read depth for a homozygous variant will looks heterozygous:

   |----del----|
****                 *****
**********     *********** read depth
   |-hom-|     |-hom-|
         ^^   ^^           min depth

To account for this alignment artifact, GRIDSS reports the lowest REF/REFPAIR value for each breakend. Note that this value can occur at any point in the homology interval (but is almost always one of the homology interval bounds). The calculated positions of the minimum at the two breakends does not have to be consistent with the breakpoint call. In the above example, the innermost positions would used and the REF would be correctly calculated as 0 on both sides.

Clone this wiki locally