Skip to content

k mers in an assembly

kjenike edited this page Feb 21, 2023 · 1 revision

Aside from pre-assembly statistics and validation, k-mers can be used for post-assembly quality control. Tools such as Merqury (Rhie et al. 2020, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02134-9), which can be downloaded here https://github.com/marbl/merqury, are based on comparing k-mers present in raw sequencing reads with k-mers present in the assembled genome. Additionally, KAT from Mapleson et al. (https://academic.oup.com/bioinformatics/article/33/4/574/2664339) uses a similar approach.

The Merqury GitHub page has excellent tutorials, so I won't go into the details here, and instead will give a conceptual overview.

Benefits of k-mer based assembly analysis:

  • Is not based on mapping the full reads back to an assembly.
  • Evaluates the whole genome and is not gene-centric.

From the Kmers, Merqury computes the "kmer completeness", which are the number of "reliable" Kmers that are present in the assembly. Here, reliable kmers are the kmers that likely not errors, based on their place in the kmer spectra that we've been looking at in previous tutorials. These reliable kmers are the ones that aren't in error peak of the spectra.

Merqury also computes what they call a "quality value" or "consensus value", which is the log probability that bases are erroneous.

Table of content

Introduction

k-mer spectra analysis

Separation of chromosomes

Species assignment using short k-mers

Others

Clone this wiki locally