-
Notifications
You must be signed in to change notification settings - Fork 6
Setting model parameters
We describe how to setup model parameters for a partition instance in libpll
. This page covers the following parameters
- Setting CLV vectors at tips from sequences and maps
- Setting CLV vectors manually
- Setting base frequencies
- Setting rate categories
- Setting substitution rates
Associated API reference:
pll_set_tip_states()
The function call for setting a tip's CLVs given the sequence is
int pll_set_tip_states(pll_partition_t * partition,
int tip_index,
const unsigned int * map,
const char * sequence);
The sequence sequence
is then translated using the provided lookup table map
which is a 256 element long array of elements of type unsigned int
and maps each ASCII character to a positive integer number. This number directly dictates how the CLV for a particular base is going to be set. libpll
provides several predefined maps which the user may use, however an arbitrary map may be allocated and passed as a parameter to the function pll_set_tip_states
.
To illustrate the usage of the function, let us assume we are dealing with nucleotide data (4 states) and we will use the predefined map pll_map_nt
to translate the bases of a sequence into CLVs. pll_map_nt
is defined as the following array:
unsigned int pll_map_nt[256] =
{
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15,
0, 1, 14, 2, 13, 0, 0, 4, 11, 0, 0, 12, 0, 3, 15, 15,
0, 0, 5, 6, 8, 8, 7, 9, 15, 10, 0, 0, 0, 0, 0, 0,
0, 1, 14, 2, 13, 0, 0, 4, 11, 0, 0, 12, 0, 3, 15, 15,
0, 0, 5, 6, 8, 8, 7, 9, 15, 10, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
};
and which directly corresponds to translating bases according to the following table
All translatable characters map to a positive value such that its binary representation indicates which entries in the CLV are set. For our nucleotide example, there are 38 valid characters: the 4 nucleotides (A
,C
,G
,T/U
), 11 characters (W
, S
, M
, K
, R
, Y
, B
, D
, H
, V
, N
) that represent ambiguities and an additional four characters (-
, ?
, O
, X
) that have the same meaning as N
(i.e. any nucleotide). Together with the lower-case characters we get a total of 38 entries. The rest return 0 in order to indicate an invalid base in the sequences. The four nucleotides are encoded as powers of two such that the bitwise AND operation on the codes of arbitrary two nucleotides yields always zero, and ambiguities are encoded as the results of bitwise OR operations between the respective nucleotide codes. For instance, Purine is encoded as 0101
since it is the bitwise OR product of 0001
(Adenine encoding) and (0100
Guanine encoding). The encoding dictates exactly which entries in the CLV are set in the order from LSB to MSB.
Now let us assume that we use four different rate categories. The CLVs have the following form
Note that the CLVs for each site and each category of a node are stored consecutively in memory in an array of type double *
as shown in the figure. All CLVs are stored in the partition in the array clv
of type double **
. The notation that libpll
uses to keep track of the CLV given a partition instance of n tip (leaf) nodes, is that entries 0 to n-1 (i.e. clv[0]
to clv[n-1]
) are reserved for tip CLVs. The clv
array may be accessed by the user. However, the preferred way of accessing it is by using the library functions:
int pll_set_tip_states(pll_partition_t * partition, int tip_index, const unsigned int * map, const char * sequence);
void pll_set_tip_clv(pll_partition_t * partition, int tip_index, const double * clv);
void pll_show_clv(pll_partition_t * partition, int index, int float_precision);
As an example let us assume that we have a sequence of length of two char * sequence = "AM"
and we use the pll_map_nt
map on the partition instance partition
(which uses 4 rate categories) in order to CLV with index 0 using the function call:
pll_set_tip_states(partition, 0, pll_map_nt, sequence);
The CLV at index 0 is set as shown in the following diagram:
For DNA data, the CLV represents the states in alphabetical order: (A
,C
,G
,T/U
). Amino acid data is represented also in alphabetical order according to the full amino acid name (not the 1-letter symbol): (A
,R
,N
,D
,C
,E
,Q
,G
,H
,I
,L
,K
,M
,F
,P
,S
,T
,W
,Y
,V
)
Name | 3-letter | 1-letter | Name | 3-letter | 1-letter |
---|---|---|---|---|---|
Alanine | Ala | A | Leucine | Leu | L |
Arginine | Arg | R | Lysine | Lys | K |
Asparagine | Asn | N | Methionine | Met | M |
Aspartic Acid | Asp | D | Phenylalanine | Phe | F |
Cysteine | Cys | C | Proline | Pro | P |
Glutamic Acid | Glu | E | Serine | Ser | S |
Glutamine | Cln | Q | Threonine | Thr | T |
Glycine | Gly | G | Tryptophan | Trp | W |
Histidine | His | H | Tyrosine | Tyr | Y |
Isoleucine | Ile | I | Valine | Val | V |
Associated API reference:
pll_set_tip_clv()
The function call for setting a tip's CLVs manually is
void pll_set_tip_clv(pll_partition_t * partition,
int tip_index,
const double * clv);
where partition
is the pointer to the partition instance, tip_index
is the CLV index of the tip we want to set and clv
is an array of states x sites
elements of type double
. This array is copied into the CLV such that every states
elements are copied rate_cats
times, where rate_cats
is the number of rate categories specified when creating a partition.
Associated API reference:
pll_set_frequencies()
The function call for setting the frequencies is
void pll_set_frequencies(pll_partition_t * partition,
unsigned int params_index,
const double * frequencies);
The call sets the frequencies of substitution model with index params_index
of partition partition
to frequencies
. The elements are copied in the same order as provided.
For example, if the CLVs where set using PLL maps, the frequencies should follow the same order: (A
,C
,G
,T/U
) for nucleotides, and (A
,R
,N
,D
,C
,E
,Q
,G
,H
,I
,L
,K
,M
,F
,P
,S
,T
,W
,Y
,V
) for amino acids.
Associated API reference:
pll_set_subst_params()
The function call for setting the substitution parameters is
void pll_set_subst_params(pll_partition_t * partition,
unsigned int params_index,
const double * params);
The call sets the substitution rates of substitution model with index params_index
of partition partition
to params
. The size of params
vector must be s(s-1)/2
, where s
is the number of states.
If the CLVs where set using PLL maps, the substitution rates should follow the same order: (A<->C
,A<->G
,A<->T
,C<->G
, C<->T
, G<->T
) for nucleotides, and (A<->R
,A<->N
,A<->D
,A<->C
,A<->E
, ...) for amino acids.