-
Notifications
You must be signed in to change notification settings - Fork 71
Home
Calculating the VAF for a SV had traditionally been done by counting the split reads and read pairs (SR
+ RP
) supporting the variant. This is problematic for the following reasons:
- Read pairs that are discordantly mapped and contain a split read will be counted twice.
- Reads that overlap the breakpoint but are aligned as soft-clipped reads will not be counted.
To resolve this issues, The gridss reports a supporting fragment count in the VF
field. A fragment can support a variant either directly through split read, soft clipped read or discordant alignment of a read pair, indirectly through incorporation of one or both of the constituent reads in an assembly supporting the variant, or both directly and indirectly. The VF
field reported by GRIDSS is a count of the unique fragments supporting the variant.
The REF
and REFPAIR
fields count the number of reads and read pairs that support the reference. Note that these fields are disjoint and read pair is only considered to support the reference if the spans the putative breakend, with each reads aligned to opposite side of the putative breakend position.
To calculate the VAF, the following formula should be used:
VAF = VF
/ (VF
+ REF
+ REFPAIR
)
For events less than the library fragment size distribution, read pairs originating from the chromatid containing the SV will fall within the fragment size distribution thus contribute to the REFPAIR
not RP
. For this reason, read pairs should be ignored when calculating the VAF of small events:
VAF (small event) = VF
/ (VF
+ REF
)
VAF is calculated per breakend. A breakpoint can be heterozygous at one breakend but homozyous at the other (e.g. a chr1-chr2 SV with a CN loss on the other chr1 chromatid).
REF
/REFPAIR
are defined in terms of reads/read pairs spanning the putative breakend. This definition is subtly different to read depth: read depth counts the reads mapping to a given base, by REF
/REFPAIR
count reads that map to both sides of the putative breakend.
When a breakpoint has homology, the true position of the break cannot be determined. In such scenarios, read aligners will systematically align half the reads to each homology. In such cases, the read depth for a homozygous variant will looks heterozygous:
|----del----|
**** *****
********** *********** read depth
|-hom-| |-hom-|
^^ ^^ min depth
To account for this alignment artifact, GRIDSS reports the lowest REF
/REFPAIR
value for each breakend. Note that this value can occur at any point in the homology interval (but is almost always one of the homology interval bounds). The calculated positions of the minimum at the two breakends does not have to be consistent with the breakpoint call. In the above example, the innermost positions would used and the REF
would be correctly calculated as 0 on both sides.