A codeml (PAML package) wrapper to make life easier. Dummy input unaligned multi-species fasta file (a single gene), and output codeml result.
-
Codeml (PAML version 4.10.6)
-
MACSE (.jar form)
-
MUSCLE
-
RAXML
-
biopython (v1.81, python package)
-
newick (v1.9.0, python package)
must be installed beforehand
Simply add ./script to your environment
- A single gene fasta sequence file (multi-species, not aligned).
- A text file which indicate the foreground species. One species each line.
cd to example/test_space/
Change the absolute path in the command lines below to to your path.
type:
Fasta2Codeml.py \
--out_dir /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/test_space_single_gene \
--project_name Simple_test \
--foreground_file /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/foreground.txt \
--fasta /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/single_gene/CLOCK.fasta \
--muscle /beegfs/store4/chenyangkang/software/ParaAT2.0/muscle \
--macse /beegfs/store4/chenyangkang/software/macse_v2.07.jar \
--raxml /beegfs/store4/chenyangkang/software/standard-RAxML/raxml \
--codeml /beegfs/store4/chenyangkang/miniconda3/bin/codeml \
--boostrap 10 \
--codon_frac 0.5 \
--sp_frac 0.5
Fasta2Codeml.py \
--out_dir /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/test_space_multi_cds \
--project_name Simple_multi_test \
--foreground_file /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/foreground.txt \
--multi_file \
--multi_file_list cds_list.txt \
--muscle /beegfs/store4/chenyangkang/software/ParaAT2.0/muscle \
--macse /beegfs/store4/chenyangkang/software/macse_v2.07.jar \
--raxml /beegfs/store4/chenyangkang/software/standard-RAxML/raxml \
--codeml /beegfs/store4/chenyangkang/miniconda3/bin/codeml \
--boostrap 10 \
--codon_frac 0.5 \
--sp_frac 0.5
- Remove species that contain only "N"s.
- Run muscle alignment with 5 iterations.
- Refine alignment using MACSE.
- Replace frameshift(!) and stop codon with NNN using MACSE.
- Concatenate files (if in multi-file mode).
- Remove codon columns with more than 50% species missed, and remove species with more than 50% codons as "NNN" or "---".
- Build tree with raxml
-f a -x 42 -p 42 -m GTRGAMMA
. - Co-filter fasta file and tree file. Trim and annotate tree with the foreground information provided. Output alignment as phylip format.
- Generate codeml configuration files for both branch-site null model (omega=1) and alternative model.
- Run both codeml model.
- Generate p values and other statistics using scipy.