Skip to content

User Manual

Nhan Ly-Trong edited this page Dec 6, 2024 · 1 revision

How to use CMAPLE?

Installation

CMAPLE executables for different platforms such as Linux, OSX, and Windows are provided at https://github.com/iqtree/cmaple/releases.

In the release package, we provide two excutables cmaple and cmaple-aa, tailored for DNA and amino acid data, respectively. For simplicity, we use cmaple in the following examples.

Usage examples

As a command-line program, CMAPLE can be run by executing cmaple ... from a terminal/console (or a command prompt under Windows).

1. Infer a phylogenetic tree from an alignment

Together with the executables, there is an example.maple alignment in the release package. Once can reconstruct a phylogenetic tree from that alignment (assuming that you are now in the same folder with example.maple).

cmaple -aln example.maple

In the above command,

  • -aln is to specify an input alignment, which could be in FASTA, PHYLIP, or MAPLE format.

CMAPLE uses the default model, which is General Time Reversible model for the DNA data in this example. At the end of the run, CMAPLE outputs the following files.

  • example.maple.treefile: the inferred phylogenetic tree, which can be visualized by many tree viewer programs (e.g., FigTree).

  • example.maple.log: log file captures all messages printed on the screen during the run. To report bugs, please send this log file and the input alignment (if possible) to the authors.

2. Specify a substitution model

CMAPLE supports various common DNA and empirical amino-acid models (replicated from the IQ-TREE software) as shown in Supported substitution models. For example, one can specify the Jukes Cantor model for the inference with the -m option.

cmaple -aln example.maple -m JC

3. Specify an input tree

One can specify an input tree (e.g., tree.nwk) in the NEWICK format using the -t option.

    cmaple -aln example.maple -t tree.nwk

If the input tree is incomplete (which doesn't contain all the taxa in the alignment), CMAPLE will:

  • Firstly, perform placements (i.e., adding missing taxa from the alignment to the tree);
  • Secondly, apply a NORMAL tree search (which does SPR moves on newly-added nodes only);
  • Finally, optimize all branch lengths.

If the input tree is complete (which contains all the taxa in the alignment), CMAPLE will, by default, do neither placement nor tree search, but it optimizes all branch lengths. To keep the branch lengths fixed, one can add --blfix to the command.

    cmaple -aln example.maple -t tree.nwk --blfix

To use the input (complete/incomplete) tree as a starting tree to perform placements (for an incomplete tree), then consider SPR moves on all nodes, and optimize branch lengths, one can add --search EXHAUSTIVE to the command.

    cmaple -aln example.maple -t tree.nwk --search EXHAUSTIVE

4. Set the tree search type

We implemented three types of tree search:

Tree search type Explanation
FAST No tree search (placement only)
NORMAL Consider pruning branches only at newly-added nodes when seeking SPR moves. This is the default setting of CMAPLE.
EXHAUSTIVE Consider all nodes when seeking SPR moves

If users don't specify an input tree, NORMAL and EXHAUSTIVE perform the same behaviors since all taxa from the alignment are first added to the tree then SPR moves are considered at all those (newly-added) nodes during the tree search.

If users input a complete tree, both FAST and NORMAL do nothing since no new taxa is added to the tree.

If users input an incomplete tree, those tree search types have completely different behaviors as described in the above table. The runtime and the accuracy increase, in general, when changing the tree search type from the top to the bottom ones.

By default, CMAPLE applies the NORMAL tree search. One can change it to, e.g, the FAST tree search by using the --search option.

cmaple -aln example.maple -t tree.nwk --search FAST

5. Assess branch supports with SH-aLRT

CMAPLE implemented the SH-like approximate likelihood ratio test (Guindon et al., 2010). To perform this test, one can run:

cmaple -aln example.maple --alrt

To speed up the assessment, one can employ multithreading by adding -nt option

cmaple -aln example.maple --alrt -nt 8

In the above example, CMAPLE uses 8 threads for computing branch supports. One can use -nt AUTO to employ all CPU cores available on the current machine.

Additionally, one can specify the number of replicates (default, 1000) and the epsilon value (default, 0.1) (see Guindon et al., 2010) by using --replicates and --epsilon options.

cmaple -aln example.maple --alrt --replicates 5000 --epsilon 0.05

6. Assess branch supports with SPRTA

Since CMAPLE version 1.1.0, we implemented another branch support method called SPRTA (DeMaio et al., 2024). To perform this branch assessment, one can run:

cmaple -aln example.maple --sprta

The SPRTA supports will be outputted to example.maple.treefile.nexus.

Note that when computing SPRTA, a NORMAL tree search will act as an EXHAUSTIVE tree search - considering applying SPRs at all nodes in the tree (see Tree search types). If one wants to keep the topology unchanged, please use a FAST tree search.

By default, CMAPLE computes support values only for branches with non-zero lengths. However, one can assess all branches, including those with zero length, by using the --zero-branch-supp option:

cmaple -aln example.maple --sprta --zero-branch-supp

Additionally, if one want CMAPLE to output a list of alternative SPRs and their supports for each branch, include the --out-alternative-spr option:

 cmaple -aln example.maple --sprta --out-alternative-spr

7. Convert an alignment to a different format

One can use the --out-aln option to write an input alignment to a file in a specific format such as MAPLE (default), PHYLIP, or FASTA format. For example, the following command

cmaple -aln aln.phy --out-aln aln.maple

writes the input alignment aln.phy to aln.maple file in the MAPLE format.

To specify the output alignment format, one can use --out-format, for example, the following command

cmaple -aln aln.phy --out-aln aln.fa --out-format fasta

writes the input alignment aln.phy to aln.fa file in the FASTA format.

Improving runtime even further

CMAPLE is optimized for speed. To make it even faster, you can swap in another high-performance allocator like jemalloc . This will give another 5-10% speedup, depending on workload and hardware.

Install jemalloc via either package manager or manually (Linux example)

git clone https://github.com/jemalloc/jemalloc.git
cd jemalloc
mkdir build
export je_build=`pwd`/build   
./autogen.sh
./configure --prefix=${je_build}
make -j20
make install
## remember this path!
echo "remember to put '${je_build}/bin' in your PATH" to make 'jemalloc-config' known
## we will do it here once, but you need to make this permanent:
export PATH="${je_build}/bin:${PATH}"

Run CMAPLE with jemalloc

## run CMAPLE with preloaded jemalloc
LD_PRELOAD=`jemalloc-config --libdir`/libjemalloc.so.`jemalloc-config --revision` ./cmaple <more args here>

Supported substitution models

All the supported substitution models in CMAPLE are listed in the following.

DNA models

Model Explanation
JC or JC69 Equal substitution rates and equal base frequencies (Jukes and Cantor, 1969).
GTR General time reversible model with unequal rates and unequal base freq. (Tavare, 1986).
UNREST Unrestricted model with non-reversible, unequal rates and unequal base freq.

Amino-acid models

CMAPLE supports all non-mixture amino acid models in IQ-TREE, which are all listed at IQ-TREE DOC.

Command reference

All the options available in CMAPLE are shown below:

Option Usage and meaning
-h or -? Print help usage.
-aln <ALIGNMENT> Specify an input alignment file in PHYLIP, FASTA, or MAPLE format.
-m <MODEL> Specify a model name. See DNA Models and Protein Models for the list of supported models. DEFAULT: GTR for DNA and LG for Protein data
-st <SEQ_TYPE> Specify a sequence type as either of DNA or AA for DNA or amino-acid sequences. DEFAULT: auto-detect from the alignment or model
--format <FORMAT> Set the alignment format as either of PHYLIP, FASTA, MAPLE, or AUTO. DEFAULT: auto-detect from the alignment
-t <TREE_FILE> Specify a file containing a starting tree for tree search. Note: the starting tree is not mandatory to consist all taxa in the input alignment.
--no-reroot Do not reroot the input tree.
--blfix Keep the branch lengths of the input tree unchanged (only applicable if the input tree consists all the taxa in the alignment).
--ignore-annotation Ignore annotations from the input tree.
--search <TYPE> Specify a tree search type as either of FAST, NORMAL, or EXHAUSTIVE. DEFAULT: NORMAL
--shallow-search Enable a shallow tree search before a deeper tree search. DEFAULT: No shallow search
--alrt Compute branch supports (SH-aLRT) of the tree.
--replicates <NUM> Set the number of replicates for computing branch supports (SH-aLRT). DEFAULT: 1000
--epsilon <NUM> Set the epsilon value (see Guindon et al., 2010) for computing branch supports (SH-aLRT). DEFAULT: 0.1
-nt <NUM_THREADS> Set the number of CPU cores used for computing branch supports. One can use -nt AUTO to use all CPU cores available on the current machine. DEFAULT: 1
--sprta Compute SPRTA (DeMaio et al., 2024) branch supports.
--thresh-opt-diff-fac <NUM> A factor (which is relative to log of the sequence length) to determine whether SPRs are close to the optimal one. DEFAULT: 1
--zero-branch-supp Compute supports for zero-length branches.
--out-alternative-spr Output alternative SPRs and their supports.
--min-sup-alt <MIN> The min support to be outputted as alternative SPRs. DEFAULT: 0.01
--prefix <PREFIX> Specify a prefix for all output files. DEFAULT: the alignment file name (-aln)
--replace-intree Allow CMAPLE to replace the input tree if a better likelihood tree is found when computing branch supports.
--out-mul-tree Output the tree in multifurcating format. DEFAULT: bifurcating tree
--out-internal Output IDs of internal nodes.
--overwrite Overwrite output files if existing.
-ref <FILENAME>,<SEQNAME> Specify the reference genome by a sequence named <SEQNAME> from an alignment file <FILENAME>.
--out-aln <NAME> Write the input alignment to a file named <NAME> in MAPLE (default), or PHYLIP, or FASTA format.
--out-format <FORMAT> Specify the format (MAPLE/PHYLIP/FASTA) to output the alignment with --out-aln.
--min-blength <NUM> Set the minimum branch length. DEFAULT: 0.2 x <one mutation per site>
--thresh-prob <NUM> Specify a relative probability threshold, which is used to ignore possible states with very low probabilities. DEFAULT: 1e-8
--mut-update <NUM> Specify the period (in term of the number of sample placements) to update the substitution rate matrix. DEFAULT: 25
--max-subs <NUM> Specify the maximum number of substitutions per site that CMAPLE is effective. DEFAULT: 0.067
--mean-subs <NUM> Specify the mean number of substitutions per site that CMAPLE is effective. DEFAULT: 0.02
--seed <NUM> Set a random number seed to reproduce a previous run. DEFAULT: the CPU clock
-v <MODE> Set the verbose mode (QUIET, MIN, MED, MAX, or DEBUG) to control the amount of messages to screen, which is uselful for debugging purposes. DEFAULT: MED