Skip to content

Population-based framework for introgression/selection/resequencing experiments

Notifications You must be signed in to change notification settings

csoeder/PopPsiSeq

Repository files navigation

PopPsiSeq

Population-based framework for introgression/selection/resequencing experiments

Background

Designed to be an extension to the "PsiSeq" protocol, which identifies candidate genome locii in selection/backcross experiments (Earley & Jones 2011).

What's New

PopPsiSeq updates the original protocol in several ways:

  • consolidation into a Snakemake pipeline simplifies a somewhat unruly workflow
  • improved data QA/QC, quality & uniqueness of mapping
  • uses empirical sequenced reads rather than fragmented reference genome to characterize species.
  • incorporates software advances in variant calling (eg, freebayes rather than directly examining pileup), data processing & visualization (eg, ggplot), and other utilities (eg, vcftools, bedtools)
  • PsiSeq uses a reciprocal mapping scheme to call variants (eg, simulans reads vs sechellia reference and sechellia reads vs simulans reference), whereas PopPsiSeq currently maps both to a third, common reference (eg, simulans reads & sechellia reads vs melanogaster reference).
  • PsiSeq assumes that differences between species are fixed; PopPsiSeq examines local changes in allele frequency (of which fixation is an extreme case).

Basic Usage

The core pipeline is contained in Snakefile and expects operational information (such as the path to the reference genome files) and metadata about the samples analyzed (including their files' paths and their relationship to the backcross).

As an example, the data from (Earley & Jones 2011) can be reanalyzed with other published DNA-Seq from Drosophila simulans and Drosophila sechellia by running the snakemake command

snakemake data/ultimate/freq_shift/freebayes/all.Earley2011_with_allSim_and_allSech.vs_dm6.bwaUniq.windowed_w100000_s100000.frqShift.bed --configfile configurations/config.basicExample.yaml

This will download the reads from NCBI, map them to the dm6 reference genome with a filtered bwa algorithm, call variants with freebayes, calculate the allele frequency shift, and smooth by bookended 100kB genomic windows.

The core pipeline can be included as a module in a larger workflow. as a simple example, this command will act as a wrapper for the above data generation; it will build the data, summarize/visualize it, and also quantify/document the workflow itself:

snakemake --snakefile workflows/Snakefile.basicExample --configfile configurations/config.basicExample.yaml

Use Examples and Applications

Legacy

The original PsiSeq, as well as an intermediate rewrite (PsiSeq2) are included as unsupported legacy code. Their pipelines can be imported by including utils/modules/Snakefile.legacy. An example workflow can be run:

snakemake data/ultimate/shared_SNPs/PsiSeq/droSim1/bwaUniq/Earley2011.SNPs_shared_with.fragSimulated_dSec1.vs_droSim1.bwaUniq.genomeWindowed_w100000_s100000.bed data/ultimate/shared_SNPs/PsiSeq2/droSim1/bwaUniq/Earley2011.SNPs_shared_with.fragSimulated_dSec1.vs_droSim1.bwaUniq.genomeWindowed_w100000_s100000.bed --snakefile workflows/Snakefile.legacy --configfile configurations/config.legacyExample.yaml

or, with self-documentation:

snakemake --snakefile workflows/Snakefile.legacyExample --configfile configurations/config.legacyExample.yaml

Moehring2024

The PopPsiSeq algorithm was used to analyze a backcross & introgression experiment in (citation). There results can be generated by running:

snakemake --configfile configurations/config.Moehring2024.yaml --snakefile workflows/Snakefile.Moehring2024

This analysis was originally written as a test comparison of the PopPsiSeq algorithm with earlier versions. This comparison, including the legacy results, can be generated:

snakemake results/Moehring_PsiSeqDev.pdf --configfile configurations/config.Moehring2024.yaml --snakefile workflows/Snakefile.Moehring2024

This workflow illustrates how modules in the original pipelines can be swapped out (eg, the smrtFreeBayes variant caller and the PsiSeq2_relaxed) as well as build upon with project-specific tasks.

PsiSeq Deep Dive

A look back at the development of the PsiSeq software, comparing versions 1 and 2 with the present algorithm on a variety of data sets. The real results are the population genetics we met along the way.

snakemake --configfile configurations/config.PsiSeqDeepDive.yaml --snakefile workflows/Snakefile.PsiSeqDeepDive

In development, with unpublished data. Coming soon!

Usage Details

Dependencies

Configuration files

Structure and Files of Note

PopPsiSeq/
├── configurations  # configuration files - sample metadata, important filepaths, etc
│   ├── config.basicExample.yaml
│   └── ...
├── data
│   ├── external        # SRA downloads stored here
│   ├── intermediate    # alignments, variant calls, etc
│   ├── raw             # ie, unpublished
│   ├── summaries       # summary data eg read QC
│   └── ultimate        # windowed results are stored here
├── markdowns       # markdown files for self-summary and writeup 
│   ├── PopPsiSeq_basicExample.Rmd
│   └── ...
├── README.md
├── scripts
│   ├── freqShifter.R   # this is the script that polarizes and calculates the allele shift
│   └── legacy          # unsupported code from v1 and v2
│       ├── PsiSeq
│       └── PsiSeq2
├── Snakefile   # core pipeline
├── utils
│   ├── genelists
│   ├── genome_windows
│   ├── legacy
│   │   └── PsiSeq.zip  # the SI for Earley 2011 is not currently available so it's mirrored here
│   └── modules         # useful sub-pipelines
│       ├── Snakefile.legacy
│       └── Snakefile.popgentools
└── workflows   # example use cases 
    ├── Snakefile.basicExample
    └── ...

Algorithm Description

comparison with versions 1 & 2

References

Earley, Eric J., and Corbin D. Jones. 2011. “Next-generation mapping of complex traits with phenotype-based selection and introgression.” Genetics 189 (4): 1203–9. doi:10.1534/genetics.111.129445.

About

Population-based framework for introgression/selection/resequencing experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published