k mers in an assembly

Aside from pre-assembly statistics and validation, k-mers can be used for post-assembly quality control. Tools such as Merqury (Rhie et al. 2020, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02134-9), which can be downloaded here https://github.com/marbl/merqury, are based on comparing k-mers present in raw sequencing reads with k-mers present in the assembled genome. Additionally, KAT from Mapleson et al. (https://academic.oup.com/bioinformatics/article/33/4/574/2664339) uses a similar approach.

The Merqury GitHub page has excellent tutorials, so I won't go into the details here, and instead will give a conceptual overview.

Benefits of k-mer based assembly analysis:

Is not based on mapping the full reads back to an assembly.
Evaluates the whole genome and is not gene-centric.

From the Kmers, Merqury computes the "kmer completeness", which are the number of "reliable" Kmers that are present in the assembly. Here, reliable kmers are the kmers that likely not errors, based on their place in the kmer spectra that we've been looking at in previous tutorials. These reliable kmers are the ones that aren't in error peak of the spectra.

Merqury also computes what they call a "quality value" or "consensus value", which is the log probability that bases are erroneous.

Table of content

Introduction

Concept of k-mers

k-mer spectra analysis

📖 Introduction to K-mer spectra analysis
- ⚒ Generating k-mer spectra tutorial
📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
📖 Analysing genome skimming data

Separation of chromosomes

📖Separate sub-genomes of an allopolyploid
📖Separating chromosomes by comparison of sequencing libraries
- ⚒ Extracting sex chromosome k-mers from a male and female sample
- ⚒ Extract k-mers specific to germ-line restricted chromosomes
- ⚒ Matching k-mers to a reference (bwa-mem)
- ⚒ Matching k-mers to sequencing reads (cookiecutter)

Species assignment using short k-mers

📖Identifying haplotypes within targeted amplicon sequencing datasets
- ⚒ Performing species assigment from targeted amplicon sequencing data

Others

🖥️ Installation of the kmer_tools conda evironment
📖 Other k-mer resources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k mers in an assembly

Table of content

Clone this wiki locally