-
Notifications
You must be signed in to change notification settings - Fork 9
Highly heterozygous diploid
Heterozygous diploid samples are usually straightforward to model. They are less likely to require the manual intervention that we sometimes see with highly inbreed samples. This is because there are usually two well defined peaks that genomescope is able to correctly identify. In contrast, as we have seen in the previous example, when there is no heterozygous peak then genomescope may incorrectly identify the homozygous peak as heterozygous.
The european small ermine moth is a great example of this. Here, the spectra was made with k=31 based on PacBio HiFi data. We actually have two specimens from this species, so we can compare the two spectra.
Specimen 1:
Specimen 2:
In both cases the estimated heterozygosity is over 1%.
Checking the smudgeplots, we see that this is likely a diploid species.
1. Since we have two specimens from the same species, can we merge the data together to get more coverage and thus potentially a better model??
No. Or at least, it isn't likely that this would improve the model. It is much more likely that this would result in a hard if not impossible to interpret histogram. This is because the two individuals here have different coverages and different levels of heterozygosity and error. As a fun at-home exercise, you can try merging the reads then re-running kmc and genomescope.
We hope these examples are providing a useful overview of what you can find when fitting your genome models depending on the genome characteristics of your organism of interest. However there is still a lot to explore, for example genome repetitiveness and plody levels.
👆 Go back to Table of Content
👉 ⚒ Let's try to figure out the genome size of a repetitive genome.
👉 📖 Read about characterization of polyploid genomes using k mer spectra analysis.
Introduction
k-mer spectra analysis
- 📖 Introduction to K-mer spectra analysis
- 📖 Basics of genome modeling
- ⚒ manual model fitting (for better understanding of the underlying model)
- ⚒ simple diploid
- ⚒ demonstrating the effect of sequencing error rate on k-mer coverage
- 📖 Common difficulties in characterisation of diploid genomes using k mer spectra analysis
- ⚒ low coverage (pitfall) - to be merged
- ⚒ very homozygous diploid
- ⚒ highly heterozygous diploid
- ⚒ Genome size of a repetitive genome (pitfall)
- ⚒ Wrong ploidy (pitfall)
- 📖 Characterization of polyploid genomes using k mer spectra analysis
- ⚒ Autotetraploid
- ⚒ Allotetraploid
- ⚒ Estimating ploidy (smudgeplot)
- 📖 Genome modeling as a quality control
- ⚒ Contamination (pitfall)
- ⚒ k-mers in an assembly (Mercury/KAT)
- 📖 Analysing genome skimming data
Separation of chromosomes
- 📖Separate sub-genomes of an allopolyploid
- 📖Separating chromosomes by comparison of sequencing libraries
Species assignment using short k-mers
Others