smvplot is a cmd line python tool to generate IGV-like screenshots.
Install the package via pip
pip install smvplot
$ smvplot --help
usage: [-h] --bam_paths STR --bam_names STR --ref FILE [--exclude_flag INT] [--map_quality INT] [--base_quality INT] [--max_depth_plot INT] [--vaf] [--for_gSmVs] [--vcf FILE] [--bed FILE] [--annotations FILE] [--annotation_names STR] [--prefix PREFIX]
[--window N] [--samtoolsbin N] [--tabixbin N] [--plot_dir DIR] [--out_format STR] [--ref_base STR] [--alt_base STR]
This script generates a png file for each entry in a vcf file, a bed file or a manually specified region.
positional arguments:
region syntax either 'chr:start-end' or 'chr:center', use --vcf or --bed for more convenience
optional arguments:
-h, --help show this help message and exit
--bam_paths STR input list of bam files separated by comma. Maximum 3 BAM files
--bam_names STR input list of names separated by comma. Same length as BAM files
--ref FILE input reference genome file (fastq format)
--exclude_flag INT Exclude the reads with corresponding SAM flags, [default = 3840]
--map_quality INT Minimum mapping quality for the reads, [default = 20]
--base_quality INT Minimum base quality for the variant, [default = 13]
--max_depth_plot INT Maximum read depth used to plot the high coverage region, [default = 500]
--vaf Include the VAF of the central position in the plot title. Requires reference genome
--for_gSmVs For the gSmVs workflow used internally in DKFZ, the VAFs are directly sourced from the input VCF.
--vcf FILE input vcf file ( as an alternative use --bed )
--bed FILE input bed file ( as an alternative use --vcf )
--annotations FILE Annotation track in bed format is indexed with a tabix. The fourth column could contain the annotation text for the segments. A comma can separate multiple files.
--annotation_names STR
annotation names separated by comma. Same length as annotation files
--prefix PREFIX target directory and file name prefix for generated output files, [default = smvplot]
--window N the output file for position X will show the region [X-window,X+window], [default = 100]
--samtoolsbin N the path to the samtools binary, [default = samtools]
--tabixbin N the path to the tabix binary, [default = tabix]
--plot_dir DIR subfolder for the plots
--out_format STR Output format of the plot, [default = pdf]
--ref_base STR Reference base for the variant entry, [default = ]
--alt_base STR Alternate base for the variant entry, [default = ]
On the GIAB samples
- For a single variant from a single BAM
--bam_paths HG001_merged.mdup.bam \
--bam_names HG001 \
--ref GRCh38_decoy_ebv_phiX_alt_hla_chr.fa \
--plot_dir ~/smvplot_test \
--prefix giab_HG001 \
--out_format png
- For a single variant from a TRIO (3 BAMs)
smvplot \
--bam_paths HG002_merged.mdup.bam,HG003_merged.mdup.bam,HG004_merged.mdup.bam \
--bam_names HG002_Son,HG003_Father,HG004_Mother \
--ref GRCh38_decoy_ebv_phiX_alt_hla_chr.fa \
--plot_dir ~/smvplot_test \
--prefix giab_HG00234 \
--out_format png \
- For multiple variants from a VCF/BED file
smvplot \
--bam_paths HG002_merged.mdup.bam,HG003_merged.mdup.bam,HG004_merged.mdup.bam \
--bam_names HG002_Son,HG003_Father,HG004_Mother \
--ref GRCh38_decoy_ebv_phiX_alt_hla_chr.fa \
--plot_dir ~/smvplot_test \
--prefix giab_HG00234 \
--out_format png \
--vcf giab_benchmark_variants.vcf # --bed giab_benchmark_variants.vcf
- Minor: Read sorting by input ref/alt based via cmd parameter
- Bug fix in plot_region argument
- Patch: Update VAF calculation
- Container generated via github actions and pushed to dockerhub
- Added container
- Minor bug fixes
- Removed the underhand issue in the RNAseq histograms
- Limit the VAF float decimals
- In a multi-BAM settings, ignore the BAMs if the path does not exist
- Add VAF to the title via pysamstats
- Inital version upload to the PIP
was originally written for the DKFZ somatic indel workflow by Philip Ginsbach and Ivo Buchhalter. Here I have updated script to a python package and added a possibility of a third BAM and a RNAseq BAM file. And also generalized the BAM inputs.