-
Notifications
You must be signed in to change notification settings - Fork 8
kmtricks pipeline
kmtricks pipeline
is a pipeline of kmtricks modules.
Note that it enables to run modules until a specific step using --until <step>
.
Examples are provided here.
kmtricks pipeline v1.4.0
DESCRIPTION
kmtricks pipeline (run all the steps, repart -> superk -> count -> merge -> format)
USAGE
kmtricks pipeline --file <FILE> --run-dir <DIR> [--kmer-size <INT>] [--hard-min <INT>]
[--mode <MODE:FORMAT:OUT>] [--repart-from <STR>]
[--soft-min <INT/STR/FLOAT>] [--recurrence-min <INT>]
[--share-min <INT>] [--until <STR>] [--minimizer-size <INT>]
[--minimizer-type <INT>] [--repartition-type <INT>]
[--nb-partitions <INT>] [--restrict-to <FLOAT>]
[--restrict-to-list <STR>] [--focus <FLOAT>] [--bloom-size <INT>]
[--bf-format <STR>] [--bitw <INT>] [-t/--threads <INT>]
[-v/--verbose <STR>] [--hist] [--kff-output] [--keep-tmp] [--skip-merge]
[--cpr] [-h/--help] [--version]
OPTIONS
[global]
--file - kmtricks input file, see README.md.
--run-dir - kmtricks runtime directory.
--kmer-size - size of a k-mer. [8, 127]. {31}
--hard-min - min abundance to keep a k-mer. {2}
--mode - matrix mode <mode:format:out>, see README {kmer:count:bin}
--hist - compute k-mer histograms. [⚑]
--kff-output - output counted k-mers in kff format (only with --until count). [⚑]
--keep-tmp - keep tmp files. [⚑]
--repart-from - use repartition from another kmtricks run.
[merge options]
--soft-min - during merge, min abundance to keep a k-mer, see README. {1}
--recurrence-min - min recurrence to keep a k-mer. {1}
--share-min - save a non-solid k-mer if it is solid in N other samples. {0}
[pipeline control]
--until - run until [all|repart|superk|count|merge|format] {all}
--skip-merge - skip merge step, only with --mode hash:bft:bin. [⚑]
[advanced performance tweaks]
--minimizer-size - size of minimizers. [4, 15] {10}
--minimizer-type - minimizer type (0=lexi, 1=freq). {0}
--repartition-type - minimizer repartition (0=unordered, 1=ordered). {0}
--nb-partitions - number of partitions (0=auto). {0}
--restrict-to - Process only a fraction of partitions. [0.05, 1.0] {1.0}
--restrict-to-list - Process only some partitions, comma separated.
--focus - 0: focus on disk usage, 1: focus on speed. [0.0, 1.0] {0.5}
--cpr - compression for kmtricks's tmp files. [⚑]
[hash mode configuration]
--bloom-size - bloom filter size {10000000}
--bf-format - bloom filter format. [howdesbt|sdsl] {howdesbt}
--bitw - entry width of cbf, with --mode hash:bfc:bin {2}
[common]
-t --threads - number of threads. {12}
-h --help - show this message and exit. [⚑]
--version - show version and exit. [⚑]
-v --verbose - verbosity level [debug|info|warning|error]. {info}
-
--mode <mode:format:out>
:-
kmer:count:bin
-> k-mer count matrix kmer:count:text
-
kmer:pa:bin
-> k-mer presence/absence matrix kmer:pa:text
-
hash:count:bin
-> hash count matrix hash:count:text
-
hash:pa:text
-> hash presence/absence matrix hash:pa:bin
-
hash:bf:bin
-> Bloom filter matrix (column-major) -
hash:bft:bin
-> Bloom filter matrix (row-major)
-
-
--soft-min <INT/STR/FLOAT>
:- All k-mers with an abundance between
hard-min
andsoft-min
are considering rescue-able. See kmtricks rescue. -
<STR>
: a path of a file containing one threshold per line, with the same order as in the input fof -
<FLOAT>
: one specific threshold T per sample is computed such that the number of k-mers occurring T times is smaller than VALUE x nb_kmers. -
<INT>
: same threshold for all samples.
- All k-mers with an abundance between
-
--recurrence-min <INT>
: All k-mers/hashes that do not occur in at least recurrence-min sample(s) are discarded. -
--save-if <INT>
: If a k-mer/hash is rescue-able, it is kept if it is solid (with an abundance greater than soft-min) in at least save-if other sample(s). -
--kff-output
: Supported only with--until count
in k-mer mode. -
--skip-merge
: Skip merge step when using--mode hash:bft:bin
and that the rescue is not needed.
Depending on parameters, kmtricks can output a lot of different files, a complete description is provided here. To work with kmtricks's output files see kmtricks dump, kmtricks aggregate and API.