HybPiper version 2.3.0
-
Add option
--compress_sample_folder
to commandhybpiper assemble
. Tarball and compress the sample folder after assembly has completed i.e.<sample_name>.tar.gz
.- This is useful when running HybPiper on HPC clusters with file number limits.
- If both an uncompressed and compressed folder exist for a sample, a warning is shown and HybPiper exits.
- All HybPiper subcommands (
stats
,recovery_heatmap
,retrieve_sequences
,paralog_retriever
,filter_by_length
) work with either compressed or uncompressed sample files/folders, or a combination of both. - If a
<sample_name>.tar.gz
file already exists for a sample, it will be extracted and used for the current run ofhybpiper assemble
, and the<sample_name>.tar.gz
file will be deleted.
-
When using BWA for read mapping, the command
samtools flagstat
is now run during thehybpiper assemble
step, rather than duringhybpiper stats
, and the results are written to a<sample_name>_bam_flagstat.tsv
\<sample_name>_unpaired_bam_flagstat.tsv
file(s).- If the
<sample_name>_bam_flagstat.tsv
\<sample_name>_unpaired_bam_flagstat.tsv
file(s) are not present in a sample directory (i.e. the sample was assembled with HybPiper version <2.3.0),samtools flagstat
will be run duringhybpiper stats
. If the sample is a*.tar.gz
file, the*.bam
file(s) will first be extracted to disk to a temporary directory calledtemp_bam_files
, within your current working directory. This temporary directory will be deleted aftersamtools flagstat
has been run.
- If the
-
Add option
--not_protein_coding
tohybpiper assemble
. When this option is provided, sequences matching your target file references will be extracted from SPAdes contigs using BLASTn, rather than Exonerate. This should improve recovery when using a target file with non-protein-coding sequences. Note that this feature is new and might have bugs - please report any issues.- Only nucleotide
*.FNA
sequences will be produced (i.e. no amino-acid sequences). - Intronerate will not be run; intron and supercontig sequences will not be produced.
- If BLASTx or DIAMOND is selected for read mapping (i.e. protein vs translated-nucleotide searches), a warning will be displayed and read mapping will switch to BWA.
- Only nucleotide
-
Add the following options to control BLASTn searches of SPAdes contigs when option
--not_protein_coding
is used:--extract_contigs_blast_task
. Task to use for blastn searches (blastn, blastn-short, megablast, dc-megablast). Default is blastn.--extract_contigs_blast_evalue
. Expectation value (E) threshold for saving hits. Default is 10.--extract_contigs_blast_word_size
. Word size for wordfinder algorithm (length of best perfect match).--extract_contigs_blast_gapopen
. Cost to open a gap.--extract_contigs_blast_gapextend
. Cost to extend a gap.--extract_contigs_blast_penalty
. Penalty for a nucleotide mismatch.--extract_contigs_blast_reward
. Reward for a nucleotide match.--extract_contigs_blast_perc_identity
. Percent identity.--extract_contigs_blast_max_target_seqs
. Maximum number of aligned sequences to keep (value of 5 or more is recommended). Default is 500.
-
The final step of the
hybpiper assemble
pipeline has been renamed fromexonerate_contigs
toextract_contigs
(as either Exonerate or BLASTn can now be used). -
Reorganised grouping of help options when running
hybpiper assemble --help
to improve clarity. -
Changed option
--timeout_assemble
forhybpiper assemble
to--timeout_assemble_reads
to match the step name. -
Changed option
--timeout_exonerate_contigs
forhybpiper assemble
to--timeout_extract_contigs
to match the step name. -
Changed option
--exonerate_hit_sliding_window_size
forhybpiper assemble
to--trim_hit_sliding_window_size
. This option now applies to either Exonerate hits (and is measured in amino-acids) or BLASTn (measured in nucleotides). Defaults are 5 amino-acids (Exonerate; changed from previous default of 3) or 15 nucleotides (BLASTn). -
Changed option
--exonerate_hit_sliding_window_thresh
forhybpiper assemble
to--trim_hit_sliding_window_thresh
. This option now applies to either Exonerate hits (and is measured via amino-acid similarity) or BLASTn (measured via nucleotide similarity). Defaults are 75 for amino-acids (Exonerate; changed from previous default of 55) or 65 for nucleotides (BLASTn). -
Fixed a bug in
fix_targetfile.py
-MAFFT
is now called viasubprocess
rather thanBio.Align.Applications.MafftCommandline
when checking for best match translations (see issue#156). -
Added a more informative error message if running
hybpiper retrieve_sequences
orhybpiper paralog_retriever
from HybPiper version >=2.2.0 on sample folders from HybPiper version >2.2.0. This error occurs because the sample folders do not contain a<prefix>_chimera_check_performed.txt
file (see issue#155). -
When extracting coding sequences from SPAdes contigs using Exonerate, changed the initial Exonerate run to not use the option
--refine full
(see Exonerate docs), unless the option--exonerate_refine_full
is provided tohybpiper assemble
. Although the Exonerate option--refine full
should improve output alignments, in some cases it can result in spurious alignment regions (e.g. an intron/non-coding region being included as an "exon" alignment) that can get incorporated in to the HybPiper output sequence.