-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathabstract.tex
10 lines (6 loc) · 1.99 KB
/
abstract.tex
1
2
3
4
5
6
7
8
9
10
\begin{abstract}
Next-generation sequencing is a powerful tool for detecting genetic variation. However, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific errors and other non-random biases cannot be detected by increased depth. The problem of accurate genotyping is exacerbated when there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model organisms using an example \textit{Eucalyptus melliodora} individual. I use the structure of the tree to find bounds on its somatic mutation rate and evaluate several algorithms for variant calling. I find that conventional methods are suitable if the genome of a close relative can be adapted to the study organism. However, with structured data, a likelihood framework that is aware of this structure is more accurate. I use the techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator (KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing data. Base quality scores can help detect errors in sequencing reads, but are often inaccurate. The most popular method for correcting this issue requires a known set of variant sites, which is unavailable in most cases. I simulate data and show that errors in this set of variant sites can cause calibration errors. I then show that KBBQ accurately recalibrates base quality scores while requiring no reference or other information and performs as well as other methods.
Finally, I use the \textit{Eucalyptus} data to investigate the impact of quality score calibration on the quality of output variant calls and show that improved base quality score calibration increases the sensitivity and reduces the false positive rate of a variant calling algorithm.
\end{abstract}