Reads2Map is a collection of WDL workflows designed to facilitate the contruction of linkage maps from sequencing reads. You can find details of each workflow release on the Read2Map releases page, available here.
The main workflows are the EmpiricalReads2Map.wdl
and the SimulatedReads2Map.wdl
. The EmpiricalReads2Map.wdl
is composed by the EmpiricalSNPCalling.wdl
that performs the SNP calling, and the EmpiricalMaps.wdl
that performs the genotype calling and map building in empirical reads. The SimulatedReads2Map.wdl
used RADinitio software to simulate Illumina reads for RADseq, exome, or WGS data and performs the SNP and genotype calling and genetic map building.
The SNP calling step in Reads2Map currently includes the popular tools: GATK, Freebayes, TASSEL, and STACKs. For genotype/dosage calling, the workflow utilizes tools like updog, polyRAD, and SuperMASSA. Lastly, Reads2Map leverages OneMap, GUSMap, and MAPpoly for linkage map construction.
For diploid data, you can visualize the results using the R package and shiny app called Reads2MapApp, available here. This package supports the visualization of linkage maps built using OneMap and GUSMap.
The Reads2Map workflows perform the SNP and genotype/dosage calling for your complete data set. However, it builds the linkage map for only a single chromosome (reference genome is required) for each combination of software and parameters. The produced maps will probably still require improvements, but their characteristics will suggest which combination of SNP and genotype calling software and parameters you should use for your data. Once the pipeline is selected, you can input the respective VCF file in R and build the complete linkage map using OneMap or MAPpoly. Use OneMap or MAPoly tutorials for guidance on building and improving the linkage map for the complete dataset.
Multiple systems are available to run WDL workflows such as Cromwell, miniWDL, and dxWDL. See further information in the openwdl documentation.
In addition, we also suggest two wrappers: pumbaa and Caper. Here is a tutorial on how to setup these tools and one example running the EmpiricalReads2Map:
To run a pipeline, first navigate to Reads2Map releases page, search for the pipeline tag you which to run, and download the pipeline’s assets (the WDL workflow, the JSON, and the ZIP with accompanying dependencies).
Check the description of the inputs for the pipelines:
Check how to evaluate the workflows results in Reads2MapApp Shiny (so far only available for diploid datasets):
Check more information and examples of usage in:
Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092
- BWA in us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.5.7-2021-06-09_16-47-48Z: Used to align simulated reads to reference;
- cutadapt in cristaniguti/ pirs-ddrad-cutadapt:0.0.1: Trim simulated reads;
- ddRADseqTools in cristaniguti/ pirs-ddrad-cutadapt:0.0.1: Set of applications useful to in silico design and testing of double digest RADseq (ddRADseq) experiments;
- Freebayes in Cristaniguti/freebayes:0.0.1: Variant call step;
- GATK in us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.5.7-2021-06-09_16-47-48Z: Variant call step using Haplotype Caller, GenomicsDBImport and GenotypeGVCFs;
- TASSEL in cristaniguti/java-in-the-cloud:0.0.2: Variant Call
- STACKs in cristaniguti/stacks:0.0.1: Variant Call
- PedigreeSim in cristaniguti/reads2map:0.0.1: Simulates progeny genotypes from parents genotypes for different types of populations;
- picard in us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.5.7-2021-06-09_16-47-48Z: Process alignment files;
- pirs in cristaniguti/ pirs-ddrad-cutadapt:0.0.1: To generate simulates paired-end reads from a reference genome;
- samtools in us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.5.7-2021-06-09_16-47-48Z: Process alignment files;
- SimuSCoP in cristaniguti/simuscopr:0.0.1: Exome and WGS Illumina reads simulations;
- RADinitio in cristaniguti/radinitio:0.0.1: RADseq Illumina reads simulation;
- SuperMASSA in cristaniguti/reads2map:0.0.1: Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids;
- bcftools in lifebitai/bcftools:1.10.2: utilities for variant calling and manipulating VCFs and BCFs;
- vcftools in cristaniguti/split_markers:0.0.1: program package designed for working with VCF files.
- MCHap in cristaniguti/mchap:0.7.0: Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
- OneMap in cristaniguti/reads2map:0.0.1: Is a software for constructing genetic maps in experimental crosses: full-sib, RILs, F2 and backcrosses;
- Reads2MapTools in cristaniguti/reads2map:0.0.1: Support package to perform mapping populations simulations and genotyping for OneMap genetic map building
- GUSMap: Genotyping Uncertainty with Sequencing data and linkage MAPping
- updog in cristaniguti/reads2map:0.0.1: Flexible Genotyping of Polyploids using Next Generation Sequencing Data
- polyRAD in cristaniguti/reads2map:0.0.1: Genotype Calling with Uncertainty from Sequencing Data in Polyploids
- Reads2MapApp in cristaniguti/reads2mapApp:0.0.1: Shiny app to evaluate Reads2Map workflows results
- simuscopR in cristaniguti/reads2map:0.0.1: Wrap-up R package for SimusCop simulations
- MAPpoly in cristaniguti/reads2map:0.0.5: Build linkage maps for autopolyploid species
Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092
This work was partially supported by the National Council for Scientific and Technological Development (CNPq - 313269/2021-1); by USDA, National Institute of Food and Agriculture (NIFA), Specialty Crop Research Initiative (SCRI) project “Tools for Genomics Assisted Breeding in Polyploids: Development of a Community Resource” (Award No. 2020-51181-32156); and by the Bill and Melinda Gates Foundation (OPP1213329) project SweetGAINS.