-
Notifications
You must be signed in to change notification settings - Fork 10
Concept of k mers
An introductory lecture is available on YouTube
In a genomic context, k-mers are sub-strings of nucleotides of length k contained within a biological sequence. This means any biological sequence can be decomposed into a number of k-mers, and this number will depend on both the length of the sequence (L) and k-mer length (k). For example, in the following sequence: AAGTCCAT (L=8), there are 7 k-mers of length 2 (2-mers), 6 3-mers, 5 4-mers, 4 5-mers, 3 6-mers and 2 7-mers, being always the number of k-mers in a sequence equal to L - k + 1.
Decomposition of a sequence to k-mers can be done on an assembly, or a read set or any other sequence or set of sequences respectively. K-mers are used for many things and therefore for each rule there will be an exception, however usually when a sequence is decomposed to k-mers, we end up with a set of k-mers and their respective frequencies. That practically means, we lose the information about the genomic context. This cost is then compensated with gained statistical power to learn about your genome.
The power k-mers is the most obvious seen in the context of whole genome sequencing.
Introduction
k-mer spectra analysis
- 📖 Introduction to K-mer spectra analysis
- 📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
- 📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
- 📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
- 📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
- 📖 Analysing genome skimming data
Separation of chromosomes
- 📖Separate sub-genomes of an allopolyploid
- 📖Separating chromosomes by comparison of sequencing libraries
Species assignment using short k-mers
Others