Skip to content

Analysing genome skimming data

Kamil S. Jaron edited this page Mar 22, 2024 · 5 revisions

This section is presenting you a suite of methods developed for work with genome skimming datasets.

image

The theory for the five tutorials is split in two lectures. The ⚒ Phylogenetic placement of samples (Skmer) & estimating genomic distance (APPLES) are covered in the first lecture (and corresponding slides). The ⚒ Double phylogenetic placement of mixed samples (MISA), ⚒ Genome size estimation of skimming data (RESPECT), and ⚒ Contamination in skimming data (CONSULT) are covered in the second lecture (corresponding slides).

Install tools

Here are the instructions for 🖥️Installation of tools to work with skimming data.

Download data for the tutorials

### Obtain yeast genomes as test case
wget https://github.com/balabanmetin/yeast-genomes/raw/master/yeast-genomes.tar.bz2
tar xvfj yeast-genomes.tar.bz2

du -sm genomes/*/*

head genomes/Saccharomyces_kudriavzevii/GCA_900682665.1_SKCA111_genomic.fna

Papers

  • Skmer:
    • S. Sarmashghi, K. Bohmann, M. T. P Gilbert, V. Bafna, and S. Mirarab. “Skmer: Assembly-Free and Alignment-Free Sample Identification Using Genome Skims.” Genome Biology Vol. 20, no. 1 (2019): pp. 34. doi:10.1186/s13059-019-1632-4.
  • APPLES:
    • M. Balaban, S. Sarmashghi, and S. Mirarab. “APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.” Edited by David Posada. Systematic Biology Vol. 69, no. 3 (2020): pp. 566–78. doi:10.1093/sysbio/syz063.
    • K. Bohmann, S. Mirarab, V. Bafna, and M. T. P. Gilbert. “Beyond DNA Barcoding: The Unrealized Potential of Genome Skim Data in Sample Identification.” Molecular Ecology, (2020), pp. mec.15507. doi:10.1111/mec.15507.
  • MISA:
    • M. Balaban, and S. Mirarab. “Phylogenetic Double Placement of Mixed Samples.” Bioinformatics Vol. 36, no. Supplement_1 (2020): pp. i335–43. doi:10.1093/bioinformatics/btaa489.
  • RESPECT:
    • Sarmashghi, S., Balaban, M., Rachtman, E., Touri, B., Mirarab, S., & Bafna, V. (2021). Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. BioRxiv, 2021.01.28.428636. doi:10.1101/2021.01.28.428636
  • Contamination
    • E. Rachtman, M. Balaban, V. Bafna, and S. Mirarab. “The Impact of Contaminants on the Accuracy of Genome Skimming and the Effectiveness of Exclusion Read Filters.” Molecular Ecology Resources Vol. 20, no. 3 (2020): pp. 1755-0998.13135. doi:10.1111/1755-0998.13135.
    • Rachtman, E., Bafna, V., & Mirarab, S. (2021). CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genomics and Bioinformatics, 3(3), 10.1101/2021.03.18.436035. doi:10.1093/nargab/lqab071

Table of content

Introduction

k-mer spectra analysis

Separation of chromosomes

Species assignment using short k-mers

Others

Clone this wiki locally