Skip to content

parallel slivar

Brent Pedersen edited this page Mar 2, 2020 · 6 revisions

On whole genome cohorts of many families or trios, slivar expr can take some time to run. To speed the (iterative) analysis of large and small cohorts, we provide pslivar which runs slivar expr in parallel across regions of the genome. Using this, we can do the rare-disease pipeline in ~5 minutes for a 150 exome trios and about 1 hour for 150 WGS trios using 32 CPUs.

To run pslivar, a user should first get a slivar expr command that runs without error. Then converting a slivar command to pslivar is as simple as changing slivar expr to pslivar, adding --fasta $reference, and capturing the VCF output to STDOUT. ($reference is the fasta sequence associate with the genome build used for aligning and calling variants in the cohort.) By default pslivar will use all available cores. This can be adjusted by adding, for example: --processes 12.

Here is a slivar command:

        slivar expr --vcf vcfs/$cohort.annotated.bcf --ped data-links/$cohort.ped \
            --exclude /uufs/chpc.utah.edu/common/HIPAA/u6000771/Data/LCR-hs38.bed.gz \
            --pass-only \
            --js $js \
            --trio "denovo:denovo(kid, mom, dad) && INFO.gnomad_popmax_af < 0.001" \
            -o vcfs/$cohort$name.vcf

and the corresponding pslivar

        pslivar expr --vcf vcfs/$cohort.annotated.bcf --ped data-links/$cohort.ped \
            --exclude /uufs/chpc.utah.edu/common/HIPAA/u6000771/Data/LCR-hs38.bed.gz \
            --pass-only \
            --js $js \
            --trio "denovo:denovo(kid, mom, dad) && INFO.gnomad_popmax_af < 0.001" \
            --fasta $reference_fasta \ # NOTE: THIS IS ADDED
            > vcfs/$cohort$name.vcf # NOTE: this is changed to `>` from `-o` and can be piped to bgzip.
Clone this wiki locally