-
Notifications
You must be signed in to change notification settings - Fork 3
User Manual
CMAPLE executables for different platforms such as Linux, OSX, and Windows are provided at https://github.com/iqtree/cmaple/releases.
In the release package, we provide two excutables cmaple
and cmaple-aa
, tailored for DNA and amino acid data, respectively. For simplicity, we use cmaple
in the following examples.
As a command-line program, CMAPLE can be run by executing cmaple ...
from a terminal/console (or a command prompt under Windows).
Together with the executables, there is an example.maple
alignment in the release package. Once can reconstruct a phylogenetic tree from that alignment (assuming that you are now in the same folder with example.maple
).
cmaple -aln example.maple
In the above command,
-
-aln
is to specify an input alignment, which could be in FASTA, PHYLIP, or MAPLE format.
CMAPLE uses the default model, which is General Time Reversible model for the DNA data in this example. At the end of the run, CMAPLE outputs the following files.
-
example.maple.treefile
: the inferred phylogenetic tree, which can be visualized by many tree viewer programs (e.g., FigTree). -
example.maple.log
: log file captures all messages printed on the screen during the run. To report bugs, please send this log file and the input alignment (if possible) to the authors.
CMAPLE supports various common DNA and empirical amino-acid models (replicated from the IQ-TREE software) as shown in Supported substitution models. For example, one can specify the Jukes Cantor model for the inference with the -m
option.
cmaple -aln example.maple -m JC
One can specify an input tree (e.g., tree.nwk
) in the NEWICK format using the -t
option.
cmaple -aln example.maple -t tree.nwk
If the input tree is incomplete (which doesn't contain all the taxa in the alignment), CMAPLE will:
- Firstly, perform placements (i.e., adding missing taxa from the alignment to the tree);
- Secondly, apply a NORMAL tree search (which does SPR moves on newly-added nodes only);
- Finally, optimize all branch lengths.
If the input tree is complete (which contains all the taxa in the alignment), CMAPLE will, by default, do neither placement nor tree search, but it optimizes all branch lengths. To keep the branch lengths fixed, one can add --blfix
to the command.
cmaple -aln example.maple -t tree.nwk --blfix
To use the input (complete/incomplete) tree as a starting tree to perform placements (for an incomplete tree), then consider SPR moves on all nodes, and optimize branch lengths, one can add --search EXHAUSTIVE
to the command.
cmaple -aln example.maple -t tree.nwk --search EXHAUSTIVE
We implemented three types of tree search:
Tree search type | Explanation |
---|---|
FAST | No tree search (placement only) |
NORMAL | Consider pruning branches only at newly-added nodes when seeking SPR moves. This is the default setting of CMAPLE. |
EXHAUSTIVE | Consider all nodes when seeking SPR moves |
If users don't specify an input tree, NORMAL
and EXHAUSTIVE
perform the same behaviors since all taxa from the alignment are first added to the tree then SPR moves are considered at all those (newly-added) nodes during the tree search.
If users input a complete tree, both FAST
and NORMAL
do nothing since no new taxa is added to the tree.
If users input an incomplete tree, those tree search types have completely different behaviors as described in the above table. The runtime and the accuracy increase, in general, when changing the tree search type from the top to the bottom ones.
By default, CMAPLE applies the NORMAL
tree search. One can change it to, e.g, the FAST
tree search by using the --search
option.
cmaple -aln example.maple -t tree.nwk --search FAST
CMAPLE implemented the SH-like approximate likelihood ratio test (Guindon et al., 2010). To perform this test, one can run:
cmaple -aln example.maple --alrt
To speed up the assessment, one can employ multithreading by adding -nt
option
cmaple -aln example.maple --alrt -nt 8
In the above example, CMAPLE uses 8 threads for computing branch supports. One can use -nt AUTO
to employ all CPU cores available on the current machine.
Additionally, one can specify the number of replicates (default, 1000) and the epsilon value (default, 0.1) (see Guindon et al., 2010) by using --replicates
and --epsilon
options.
cmaple -aln example.maple --alrt --replicates 5000 --epsilon 0.05
Since CMAPLE version 1.1.0, we implemented another branch support method called SPRTA (DeMaio et al., 2024). To perform this branch assessment, one can run:
cmaple -aln example.maple --sprta
The SPRTA supports will be outputted to example.maple.treefile.nexus
.
Note that when computing SPRTA, a NORMAL tree search will act as an EXHAUSTIVE tree search - considering applying SPRs at all nodes in the tree (see Tree search types). If one wants to keep the topology unchanged, please use a FAST tree search.
By default, CMAPLE computes support values only for branches with non-zero lengths. However, one can assess all branches, including those with zero length, by using the --zero-branch-supp
option:
cmaple -aln example.maple --sprta --zero-branch-supp
Additionally, if one want CMAPLE to output a list of alternative SPRs and their supports for each branch, include the --out-alternative-spr
option:
cmaple -aln example.maple --sprta --out-alternative-spr
One can use the --out-aln
option to write an input alignment to a file in a specific format such as MAPLE (default), PHYLIP, or FASTA format. For example, the following command
cmaple -aln aln.phy --out-aln aln.maple
writes the input alignment aln.phy
to aln.maple
file in the MAPLE format.
To specify the output alignment format, one can use --out-format
, for example, the following command
cmaple -aln aln.phy --out-aln aln.fa --out-format fasta
writes the input alignment aln.phy
to aln.fa
file in the FASTA format.
CMAPLE is optimized for speed. To make it even faster, you can swap in another high-performance allocator like jemalloc . This will give another 5-10% speedup, depending on workload and hardware.
git clone https://github.com/jemalloc/jemalloc.git
cd jemalloc
mkdir build
export je_build=`pwd`/build
./autogen.sh
./configure --prefix=${je_build}
make -j20
make install
## remember this path!
echo "remember to put '${je_build}/bin' in your PATH" to make 'jemalloc-config' known
## we will do it here once, but you need to make this permanent:
export PATH="${je_build}/bin:${PATH}"
## run CMAPLE with preloaded jemalloc
LD_PRELOAD=`jemalloc-config --libdir`/libjemalloc.so.`jemalloc-config --revision` ./cmaple <more args here>
All the supported substitution models in CMAPLE are listed in the following.
Model | Explanation |
---|---|
JC or JC69 | Equal substitution rates and equal base frequencies (Jukes and Cantor, 1969). |
GTR | General time reversible model with unequal rates and unequal base freq. (Tavare, 1986). |
UNREST | Unrestricted model with non-reversible, unequal rates and unequal base freq. |
CMAPLE supports all non-mixture amino acid models in IQ-TREE, which are all listed at IQ-TREE DOC.
All the options available in CMAPLE are shown below:
Option | Usage and meaning |
---|---|
-h or -?
|
Print help usage. |
-aln <ALIGNMENT> |
Specify an input alignment file in PHYLIP, FASTA, or MAPLE format. |
-m <MODEL> |
Specify a model name. See DNA Models and Protein Models for the list of supported models. DEFAULT: GTR for DNA and LG for Protein data |
-st <SEQ_TYPE> |
Specify a sequence type as either of DNA or AA for DNA or amino-acid sequences. DEFAULT: auto-detect from the alignment or model
|
--format <FORMAT> |
Set the alignment format as either of PHYLIP, FASTA, MAPLE, or AUTO. DEFAULT: auto-detect from the alignment |
-t <TREE_FILE> |
Specify a file containing a starting tree for tree search. Note: the starting tree is not mandatory to consist all taxa in the input alignment. |
--no-reroot |
Do not reroot the input tree. |
--blfix |
Keep the branch lengths of the input tree unchanged (only applicable if the input tree consists all the taxa in the alignment). |
--ignore-annotation |
Ignore annotations from the input tree. |
--search <TYPE> |
Specify a tree search type as either of FAST , NORMAL , or EXHAUSTIVE . DEFAULT: NORMAL
|
--shallow-search |
Enable a shallow tree search before a deeper tree search. DEFAULT: No shallow search |
--alrt |
Compute branch supports (SH-aLRT) of the tree. |
--replicates <NUM> |
Set the number of replicates for computing branch supports (SH-aLRT). DEFAULT: 1000 |
--epsilon <NUM> |
Set the epsilon value (see Guindon et al., 2010) for computing branch supports (SH-aLRT). DEFAULT: 0.1 |
-nt <NUM_THREADS> |
Set the number of CPU cores used for computing branch supports. One can use -nt AUTO to use all CPU cores available on the current machine. DEFAULT: 1
|
--sprta |
Compute SPRTA (DeMaio et al., 2024) branch supports. |
--thresh-opt-diff-fac <NUM> |
A factor (which is relative to log of the sequence length) to determine whether SPRs are close to the optimal one. DEFAULT: 1 |
--zero-branch-supp |
Compute supports for zero-length branches. |
--out-alternative-spr |
Output alternative SPRs and their supports. |
--min-sup-alt <MIN> |
The min support to be outputted as alternative SPRs. DEFAULT: 0.01 |
--prefix <PREFIX> |
Specify a prefix for all output files. DEFAULT: the alignment file name (-aln )
|
--replace-intree |
Allow CMAPLE to replace the input tree if a better likelihood tree is found when computing branch supports. |
--out-mul-tree |
Output the tree in multifurcating format. DEFAULT: bifurcating tree |
--out-internal |
Output IDs of internal nodes. |
--overwrite |
Overwrite output files if existing. |
-ref <FILENAME>,<SEQNAME> |
Specify the reference genome by a sequence named <SEQNAME> from an alignment file <FILENAME> . |
--out-aln <NAME> |
Write the input alignment to a file named <NAME> in MAPLE (default), or PHYLIP, or FASTA format. |
--out-format <FORMAT> |
Specify the format (MAPLE/PHYLIP/FASTA) to output the alignment with --out-aln . |
--min-blength <NUM> |
Set the minimum branch length. DEFAULT: 0.2 x <one mutation per site> |
--thresh-prob <NUM> |
Specify a relative probability threshold, which is used to ignore possible states with very low probabilities. DEFAULT: 1e-8 |
--mut-update <NUM> |
Specify the period (in term of the number of sample placements) to update the substitution rate matrix. DEFAULT: 25 |
--max-subs <NUM> |
Specify the maximum number of substitutions per site that CMAPLE is effective. DEFAULT: 0.067 |
--mean-subs <NUM> |
Specify the mean number of substitutions per site that CMAPLE is effective. DEFAULT: 0.02 |
--seed <NUM> |
Set a random number seed to reproduce a previous run. DEFAULT: the CPU clock |
-v <MODE> |
Set the verbose mode (QUIET , MIN , MED , MAX , or DEBUG ) to control the amount of messages to screen, which is uselful for debugging purposes. DEFAULT: MED
|