Quick and scalable software to deconvolve read clouds from linked-reads experiments without a reference genome. When several fragments of DNA have been sequenced with the same barcode, QuickDeconvolution provides the user with enhanced barcodes to distinguish the reads coming from the different fragments
You can install QuickDeconvolution through Bioconda
conda install -c bioconda quickdeconvolution
Alternatively, QuickDeconvolution is quite straightforward to compile from source. You will need make and cmake >= 2.8 to compile the sources. In the desired folder, run
git clone https://github.com/RolandFaure/QuickDeconvolution.git
cd QuickDeconvolution/
cmake ./
make
An executable named QuickDeconvolution should appear in the folder. A small test file "test.fastq", from a simulated sequencing experiment on a small synthetic genome, is provided in the folder "test_data" to test the program.
QuickDeconvolution -i test_data/test.fastq -o test_data/test_out.fastq
The program should run in less than a minute and output in test_out.fastq the reads, with barcode extensions (-1, -2,...). This is only intended as a test to see if QD is running: the deconvolution is expected to be very bad because the synthetic genome is very short (thus two long reads overlap with high probability).
SYNOPSIS
./QuickDeconvolution -i [<input-file>] -o [<output-file>] [-k [<k>]] [-w
[<w>]] [-d [<d>]] [-t [<t>]] [-a [<a>]]
OPTIONS
-k, --kmers-length
size of kmers [default:21]
-w, --window-size
size of window guaranteed to contain at least one minimizing kmer [default:40]
-d, --density
on average 1/2^d kmers are indexed [default:3]
-t, --threads
number of threads [default:1]
-a, --dropout
QD does not try to deconvolve clouds smaller than this value [default:0]
QuickDeconvolution takes as input -i
a fasta or a fastq file containing barcoded reads with the tag BX:Z
designating a barcode (this is the default output of longranger basic). For example
@read_456 cov:23.45 BX:Z:AAAACTGTAT
If the reads are paired, provide QuickDeconvolution with an interleaved file where the two ends of the pairs have the same name, it will recognize it. To interleave two files, you can use this command line:
paste -d '\n' <(awk '{if (NR%4==1){printf"\n";printf $0;} else{printf "((()))"$0;}}' reads_foward.fq) <(awk '{if (NR%4==1){printf"\n";printf $0;}else{ printf "((()))"$0;}}' reads_reverse.fq) | sed 's/((()))/\n/g' > sequencing_reads_interleaved.fastq
QuickDeconvolution outputs the fasta/q file given as input, with an additional tag (-0, -1, -2...) at the end of the line, so that the deconvolved reads look like
@read_456 cov:23.45 BX:Z:AAAACTGTAT-1
Within each barcode, reads having the same tag come from the same fragment. WARNING: the -0 tag is a special tag, indicating reads that could not be deconvolved by the program. If a tag is already present, QuickDeconvolution will nonetheless append a new tag at the end of the barcode:
@read_456 cov:23.45 BX:Z:AAAACTGTAT-1-3
Option -a is the dropout option: the program disregard all clouds containing fewer reads than this value. You may want to use the option if you know you'll need clouds of a certain size for your downstream analyses, in which case it might be a waste of time to deconvolve the smallest clouds.
Option -t is the number of threads to launch simultaneously on the program. Wall-clock time decreases and RAM usage increases with the number of threads.
Options k, w and d are parameters of the alignment within QuickDeconvolution. The deconvolution should not be very sensitive to these values. k is the length of the k-mers. Avoid decreasing k below 15. d is to monitor the density of sparse k-mers. On average 1/2^d k-mers will be sparse. While choosing sparse k-mers, the program is ensured to choose at least 1 k-mer in a window of size w. Decreasing w and/or d may in some cases increase precision at the expense of run-time. Keep w within the range [10,50] and d within range [1,5].
QuickDeconvolution is distributed under the license GPL3
QuickDeconvolution is published in Bioinformatics advances. You can cite using: Faure, Roland, and Dominique Lavenier. “QuickDeconvolution: Fast and Scalable Deconvolution of Linked-Read Sequencing Data.” Bioinformatics Advances, September 26, 2022, vbac068. https://doi.org/10.1093/bioadv/vbac068.