Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Bazam to extract reads #30

Merged
merged 38 commits into from
Jul 22, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
1738e64
modify wgs pipeline to run from BAM without FASTQ extraction
ssadedin Jan 17, 2018
bda7578
various tweaks, add unsorted extract option for fastest extraction
ssadedin Mar 13, 2018
f9b4947
add sharding capability for parallel bwa
ssadedin Mar 19, 2018
28b60cc
add updated bpipe 0.9.9.6
ssadedin Mar 19, 2018
6b779d1
estimate median coverage from whole genome rather than STR regions
ssadedin Mar 21, 2018
a5d5559
allow identify_locus.py to take multiple bams from the same sample
hdashnow Mar 21, 2018
af5ba1b
fix error reporting so it's per bam file
hdashnow Mar 21, 2018
b205ff3
Update install to set ngs tools, include CRAM_REF option, align_bwa_b…
hdashnow Apr 9, 2018
b748259
fix CRAM support, convert to use official bazam jar
ssadedin May 1, 2018
f487dd3
fix sharding producing different results to non-sharded
ssadedin May 7, 2018
3d97c1d
add mosdepth for median coverage estimation
hdashnow Jun 21, 2018
3911b33
trying to fix error where not all files used as input to the estimate…
hdashnow Jun 26, 2018
209fd1b
update versions of some tools in install script
hdashnow Jun 26, 2018
4be9c27
track samples and require files explicitly in estimate size
ssadedin Jun 26, 2018
46f8525
fix row duplication bug
hdashnow May 9, 2018
135ab85
change to new pandas reindexing
hdashnow Jun 26, 2018
30af1ce
update travis install to match general install.sh
hdashnow Jun 26, 2018
fc5f81c
set samples in all pipelines
hdashnow Jun 26, 2018
2345edc
wildcard matching for input files to estimate_size stage
hdashnow Jun 26, 2018
3b4920f
broadcast series to dataframe required in latest version of pandas
hdashnow Jun 27, 2018
d761cf3
install mosdepth from conda and bazam from source
hdashnow Jun 27, 2018
58aba98
clean up bazam command
hdashnow Jun 28, 2018
dfdd7ab
update broad bpipe.config
hdashnow Jun 28, 2018
4ef55e2
fix sharding error and remove no longer required from statement
hdashnow Jun 28, 2018
87df22b
workaround for pd.to_csv bug
hdashnow Jul 3, 2018
4bb8399
add newline to CIGAR warning message
hdashnow Jul 3, 2018
bebdf83
remove str targets stage from wgs bam pipeline
hdashnow Jul 3, 2018
a702930
add picard to installation
hdashnow Jul 19, 2018
68d9fba
remove bed positional argument from STRetch_wgs_bam_pipeline
hdashnow Jul 19, 2018
b1cbd1b
remove duplicate setting Dsamjdk.reference_fasta
hdashnow Jul 19, 2018
bdfada2
use new control set made with Bazam
hdashnow Jul 19, 2018
b2e177a
temporarily fix to older version of bazam
hdashnow Jul 19, 2018
7a2fe7c
update travis install to match general install
hdashnow Jul 19, 2018
1f3239c
merge identify_locus.py in favour of feat-faster-bam-pipeline
hdashnow Jul 19, 2018
dccb216
Comment out controls in travis install
hdashnow Jul 19, 2018
646311c
fix typo for input file type
hdashnow Jul 22, 2018
6f93492
update meerkat config example
hdashnow Jul 22, 2018
27b8a62
wgs median_cov to use newly mapped bam
hdashnow Jul 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions .testing/install-ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ mkdir -p tools/bin
cd tools

#a list of which programs need to be installed
commands="bpipe python goleft bedtools bwa samtools"
commands="bpipe python goleft bedtools bwa samtools mosdepth bazam picard"

#installation method
function bpipe_install {
wget -O bpipe-0.9.9.2.tar.gz https://github.com/ssadedin/bpipe/releases/download/0.9.9.2/bpipe-0.9.9.2.tar.gz
tar -zxvf bpipe-0.9.9.2.tar.gz ; rm bpipe-0.9.9.2.tar.gz
ln -s $PWD/bpipe-0.9.9.2/bin/* $PWD/bin/
wget -O bpipe-0.9.9.5.tar.gz https://github.com/ssadedin/bpipe/releases/download/0.9.9.5/bpipe-0.9.9.5.tar.gz
tar -zxvf bpipe-0.9.9.5.tar.gz ; rm bpipe-0.9.9.5.tar.gz
ln -s $PWD/bpipe-0.9.9.5/bin/* $PWD/bin/
}

# Installs miniconda, Python 3 + required packages, BedTools and goleft
Expand All @@ -44,10 +44,24 @@ function bwa_install {
}

function samtools_install {
wget --no-check-certificate https://sourceforge.net/projects/samtools/files/samtools/1.3.1/samtools-1.3.1.tar.bz2
tar -jxvf samtools-1.3.1.tar.bz2
rm samtools-1.3.1.tar.bz2
make prefix=$PWD install -C samtools-1.3.1/
wget --no-check-certificate https://github.com/samtools/samtools/releases/download/1.8/samtools-1.8.tar.bz2
tar -jxvf samtools-1.8.tar.bz2
rm samtools-1.8.tar.bz2
make prefix=$PWD install -C samtools-1.8/
}

function bazam_install {
git clone git@github.com:ssadedin/bazam.git
cd bazam
git reset --hard 72b0e90be18bf8341a4b0368d4a7abf806c631bc
./gradlew clean jar
cd ..
ln -s $PWD/bazam/build/libs/bazam.jar $PWD/bin/bazam
}

function picard_install {
wget https://github.com/broadinstitute/picard/releases/download/2.18.9/picard.jar
ln -s $PWD/picard.jar $PWD/bin/picard
}

function download {
Expand All @@ -71,8 +85,13 @@ echo "// Number of threads to use for BWA" >> $toolspec
echo "threads=8" >> $toolspec
echo >> $toolspec
echo "// For exome pipeline only ***Edit before running the exome pipeline***" >> $toolspec
echo "EXOME_TARGET=\"path/to/exome_target_regions.bed\"" >> $toolspec
echo "// Uncomment the line below to run the STRetch installation test, or specify your own" >> $toolspec
echo "EXOME_TARGET=\"SCA8_region.bed\"" >> $toolspec
echo >> $toolspec
echo "// For bam pipeline only ***Edit before running if using CRAM input format***" >> $toolspec
echo "CRAM_REF=\"path/to/reference_genome_used_to_create_cram.fasta\"" >> $toolspec
echo >> $toolspec

#set STRetch base directory
echo "// STRetch installation location" >> $toolspec
Expand Down Expand Up @@ -103,13 +122,14 @@ echo "refdir=\"$refdir\"" >> $toolspec

echo >> $toolspec
echo "// Decoy reference assumed to have matching .genome file in the same directory" >> $toolspec
echo "REF=\"$refdir/hg19.STRdecoys.sorted.fasta\"" >> $toolspec
echo "REF=\"$refdir/hg19.chr13.STRdecoys.sorted.fasta\"" >> $toolspec
echo "STR_BED=\"$refdir/hg19.simpleRepeat_period1-6_dedup.sorted.bed\"" >> $toolspec
echo "DECOY_BED=\"$refdir/STRdecoys.sorted.bed\"" >> $toolspec
echo "// By default, uses other samples in the same batch as a control" >> $toolspec
echo "CONTROL=\"\"" >> $toolspec
echo "// Uncomment the line below to use a set of WGS samples as controls, or specify your own" >> $toolspec
echo "CONTROL=\"$refdir/PCRfreeWGS.controls.tsv\"" >> $toolspec
echo "//CONTROL=\"$refdir/PCRfreeWGS_143_controls.tsv\"" >> $toolspec
echo >> $toolspec


Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ dependencies:
- scikit-learn
- scipy
- numpy
- mosdepth
37 changes: 28 additions & 9 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ mkdir -p tools/bin
cd tools

#a list of which programs need to be installed
commands="bpipe python goleft bedtools bwa samtools"
commands="bpipe python goleft bedtools bwa samtools mosdepth bazam picard"

#installation method
function bpipe_install {
wget -O bpipe-0.9.9.2.tar.gz https://github.com/ssadedin/bpipe/releases/download/0.9.9.2/bpipe-0.9.9.2.tar.gz
tar -zxvf bpipe-0.9.9.2.tar.gz ; rm bpipe-0.9.9.2.tar.gz
ln -s $PWD/bpipe-0.9.9.2/bin/* $PWD/bin/
wget -O bpipe-0.9.9.5.tar.gz https://github.com/ssadedin/bpipe/releases/download/0.9.9.5/bpipe-0.9.9.5.tar.gz
tar -zxvf bpipe-0.9.9.5.tar.gz ; rm bpipe-0.9.9.5.tar.gz
ln -s $PWD/bpipe-0.9.9.5/bin/* $PWD/bin/
}

# Installs miniconda, Python 3 + required packages, BedTools and goleft
Expand All @@ -44,10 +44,24 @@ function bwa_install {
}

function samtools_install {
wget --no-check-certificate https://sourceforge.net/projects/samtools/files/samtools/1.3.1/samtools-1.3.1.tar.bz2
tar -jxvf samtools-1.3.1.tar.bz2
rm samtools-1.3.1.tar.bz2
make prefix=$PWD install -C samtools-1.3.1/
wget --no-check-certificate https://github.com/samtools/samtools/releases/download/1.8/samtools-1.8.tar.bz2
tar -jxvf samtools-1.8.tar.bz2
rm samtools-1.8.tar.bz2
make prefix=$PWD install -C samtools-1.8/
}

function bazam_install {
git clone git@github.com:ssadedin/bazam.git
cd bazam
git reset --hard 72b0e90be18bf8341a4b0368d4a7abf806c631bc
./gradlew clean jar
cd ..
ln -s $PWD/bazam/build/libs/bazam.jar $PWD/bin/bazam
}

function picard_install {
wget https://github.com/broadinstitute/picard/releases/download/2.18.9/picard.jar
ln -s $PWD/picard.jar $PWD/bin/picard
}

function download_hg19 {
Expand All @@ -69,6 +83,11 @@ echo "threads=8" >> $toolspec
echo >> $toolspec
echo "// For exome pipeline only ***Edit before running the exome pipeline***" >> $toolspec
echo "EXOME_TARGET=\"path/to/exome_target_regions.bed\"" >> $toolspec
echo "// Uncomment the line below to run the STRetch installation test, or specify your own" >> $toolspec
echo "//EXOME_TARGET=\"SCA8_region.bed\"" >> $toolspec
echo >> $toolspec
echo "// For bam pipeline only ***Edit before running if using CRAM input format***" >> $toolspec
echo "CRAM_REF=\"path/to/reference_genome_used_to_create_cram.fasta\"" >> $toolspec
echo >> $toolspec

#set STRetch base directory
Expand Down Expand Up @@ -106,7 +125,7 @@ echo "DECOY_BED=\"$refdir/STRdecoys.sorted.bed\"" >> $toolspec
echo "// By default, uses other samples in the same batch as a control" >> $toolspec
echo "CONTROL=\"\"" >> $toolspec
echo "// Uncomment the line below to use a set of WGS samples as controls, or specify your own" >> $toolspec
echo "//CONTROL=\"$refdir/PCRfreeWGS.controls.tsv\"" >> $toolspec
echo "//CONTROL=\"$refdir/PCRfreeWGS_143_controls.tsv\"" >> $toolspec
echo >> $toolspec


Expand Down
32 changes: 26 additions & 6 deletions pipelines/STRetch_wgs_bam_pipeline.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,33 @@ load 'pipeline_config.groovy'
// Load Bpipe pipeline stages
load 'pipeline_stages.groovy'

if(args.any { it.endsWith('.cram') })
input_type = 'cram'
else
input_type='bam'

inputs "$input_type" : "Please supply one or more $input_type files to process"

bwa_parallelism = 1

shards = 1..bwa_parallelism

if(input_type == "cram")
requires CRAM_REF: "To use CRAM format, please set the CRAM_REF parameter in pipeline_config.groovy to specify the reference to used to compress the CRAM file"

init_shard = {
branch.shard = branch.name
}

run {
str_targets +
'%.bam' * [
set_sample_info +
extract_reads_region +
align_bwa + index_bam +
median_cov_target +
"%.${input_type}" * [
set_sample_info +
[
mosdepth_dist + mosdepth_median,
shards * [
init_shard + align_bwa_bam + index_bam
] + merge_bams
] +
STR_coverage +
STR_locus_counts
] +
Expand Down
4 changes: 3 additions & 1 deletion pipelines/config-examples/bpipe.config_broad
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
use="R-3.3 Samtools BWA BEDTools"
use="Java-1.8"

commands {
bwamem {
memory="32g"
threads=8
walltime="168:00:00"
}


Expand Down
8 changes: 4 additions & 4 deletions pipelines/config-examples/bpipe.config_meerkat
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
walltime="02:00:00"
modules="R bwa samtools bedtools"
walltime="04:00:00"
modules="java"

commands {
bwamem {
walltime="24:00:00"
walltime="48:00:00"
procs=8
memory=32
}


bedtools {
walltime="04:00:00"
memory=8
}
}
Loading