-
Notifications
You must be signed in to change notification settings - Fork 9
k mers in an assembly
Aside from pre-assembly statistics and validation, k-mers can be used for post-assembly quality control. Tools such as Merqury (Rhie et al. 2020, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02134-9), which can be downloaded here https://github.com/marbl/merqury, are based on comparing k-mers present in raw sequencing reads with k-mers present in the assembled genome. Additionally, KAT from Mapleson et al. (https://academic.oup.com/bioinformatics/article/33/4/574/2664339) uses a similar approach.
The Merqury GitHub page has excellent tutorials, so I won't go into the details here, and instead will give a conceptual overview.
Benefits of k-mer based assembly analysis:
- Is not based on mapping the full reads back to an assembly.
- Evaluates the whole genome and is not gene-centric.
From the Kmers, Merqury computes the "kmer completeness", which are the number of "reliable" Kmers that are present in the assembly. Here, reliable kmers are the kmers that likely not errors, based on their place in the kmer spectra that we've been looking at in previous tutorials. These reliable kmers are the ones that aren't in error peak of the spectra.
Merqury also computes what they call a "quality value" or "consensus value", which is the log probability that bases are erroneous.
Introduction
k-mer spectra analysis
- 📖 Introduction to K-mer spectra analysis
- 📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
- 📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
- 📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
- 📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
- 📖 Analysing genome skimming data
Separation of chromosomes
- 📖Separate sub-genomes of an allopolyploid
- 📖Separating chromosomes by comparison of sequencing libraries
Species assignment using short k-mers
Others