Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
pp.pl		pp.pl

README.md

Analyse PAFTOL tree and label nodes

Method

The program has two main tasks: First: to identify the nodes corresponding to higher taxa, such as family or order; Second: to confirm or reject samples at these taxonomic ranks, based on their placement within or outside these nodes, respectively.

For every node in the tree, two metrics were calculated: (i) the proportion of samples belonging to a taxon (e.g. family) that subtended the node, and (ii) the proportion of samples subtending the node that belonged to that taxon. The two metrics were then multiplied to produce a combined score. For each taxon, the highest scoring node was subsequently considered to best represent the taxon in the tree. Each node corresponding to a taxon were categorized as (i) well-resolved when reaching a score of 1, (ii) resolved when <1 but >0.5, (iii) poorly resolved if the score was <0.5.
The phylogenetic validation of family identification of each sample was determined as:

a. Confirmed: if identified as belonging to a family whose best scoring node had a value >0.5 and found as a descendant of this node in the tree.

b. Rejected: if identified as belonging to a family whose best scoring node had a value >0.5 but not found as a descendant of this node.

c. Inconclusive: if belonging to a family whose best scoring node had a value ≤0.5.

Note that for families represented in the tree by a single sample, the validation was performed with respect to their orders. If the order was represented by a single sample, the sample was considered untestable and coded as inconclusive.

Usage

Inputs:

opt_tree Input tree (required)
opt_dup is an (optional) list of PAFTOL entries to disregard
opt_root is the node ID of the true root. Need to find this manually based on content of tree

perl pp.pl -tree treefile.nwk [-dup.dup.txt] -good g.txt -alien a.txt -outlier o.txt -specimen s.txt -tree2 new_treefile.nwk [-root 100000] [-order] [-well] [-help] > output.txt

Options:

tree=s Input (unrooted) tree in Newick format
dup=s Optional list of nodes to ignore
good=s List of all specimens not needed for manual review (if running on a pre-tree)
bad=s List of badly resolving higher taxa
alien=s List of all specimens intruding in well-defined families
outlier=s List of all taxa outlying an ancestral taxon
specimen=s List of score for how well each specimen matches to its family
tree2=s File to which a simplified (rooted) family-level tree will be written
root=i Node to be used to root the new tree
order Specify this to write a simplified order-level tree instead of a family level tree
well Specify this and specimens will only be considered outliers to well-defined parents
help Prints out usage

Outputs:

The program produces:

A list of outliers (i.e. samples not found under the node best describing their family) and aliens (i.e. samples found in a resolved family but not belonging to it). Note that by default the program produces a summary of the tree for families, but it can used for other taxonomic levels (e.g. order, genus)
Taxonomic clashes (inconsistent families and orders)
The proportion of monophyletic nodes at each level of the taxonomy
The proportion of monophyletic nodes at each level of the taxonomy
mean %consistency of each taxonomic head node at each maximum taxonomic group size
family match score for each species
summary, rooted species and family level trees

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phylogenetic_Validation

Phylogenetic_Validation

README.md

Analyse PAFTOL tree and label nodes

Method

Usage

Files

Phylogenetic_Validation

Directory actions

More options

Directory actions

More options

Latest commit

History

Phylogenetic_Validation

Folders and files

parent directory

README.md

Analyse PAFTOL tree and label nodes

Method

Usage