Skip to content

High throughput tool for tall and wide multiple sequence alignment.

License

Notifications You must be signed in to change notification settings

TurakhiaLab/TWILIGHT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TWILIGHT: Tall and Wide Alignments at High Throughput

Table of Contents


Introduction

TWILIGHT (Tall and Wide Alignments at High Throughput) is a tool designed for ultrafast and ultralarge multiple sequence alignment. It is able to scale to millions of long nucleotide sequences (>10000 bases). TWILIGHT can run on CPU-only platforms (Linux/Mac) or take advantage of CUDA-capable GPUs for further acceleration.

By default, TWILIGHT requires an unaligned sequence file in FASTA format and an input guide tree in Newick format to generate the output alignment in FASTA format (Fig. 1a). When a guide tree is unavailable, users can utilize the iterative mode, which provides a Snakemake workflow to estimate guide trees using external tools (Fig 1b.).

TWILIGHT adopts the progressive alignment algorithm (Fig. 1c) and employs tiling strategies to band alignments (Fig. 1e). Combined with a divide-and-conquer technique (Fig. 1a), a novel heuristic dealing with gappy columns (Fig. 1d) and support for GPU acceleration (Fig. 1f), TWILIGHT demonstrates exceptional speed and memory efficiency.

Figure 1: Overview of TWILIGHT alogorithm

Installation

Using installation script (requires sudo access and only for Ubuntu)

This has been tested only on Ubuntu. Users on other platforms or systems please refer to the next section to install TWILIGHT using Docker.

Step 0: Dependencies

  • Git: sudo apt install -y git
  • Conda: Optional, for iterative mode

Step 1: Clone the repository

git clone https://github.com/TurakhiaLab/TWILIGHT.git
cd TWILIGHT

Step 2: Install dependencies (requires sudo access)

Skip this step if the below libraries are already installed.

- wget
- build-essential 
- cmake 
- libboost-all-dev 
- libtbb2 
- protobuf-compiler

Otherwise,

bash ./install/installDependencies.sh

Step 3: Install TWILIGHT

If CUDA-capable GPUs are detected, the GPU version will be built; otherwise, the CPU version will be used.

bash ./install/installTWILIGHT.sh

Step 4: Enter build directory and run TWILIGHT

cd build
./twilight --help

Step 5 (optional): Install TWILIGHT iterative mode

Step 5.1 Create and activate a Conda environment (ensure Conda is installed first)

cd ../ # Return to TWILIGHT home directory
conda create -n twilight -y
conda activate twilight

Step 5.2 Install Snakemake and tree inference tools

bash ./install/installIterative.sh

Using Dockerfile

The Dockerfile installed all the dependencies and tools for TWILIGHT default/iterative mode.

Step 0: Dependencies

  • Git: sudo apt install -y git
  • Docker

Step 1: Clone the repository

git clone https://github.com/TurakhiaLab/TWILIGHT.git
cd TWILIGHT

Step 2: Build a docker image

CPU version

cd docker/cpu
docker build -t twilight .

GPU version (using nvidia/cuda as base image)

cd docker/gpu
docker build -t twilight .

Step 3: Build and run docker container

docker run --platform=linux/amd64 -it twilight

Step 4: Enter build directory and run TWILIGHT

cd build
./twilight -h

Run TWILIGHT

Default Mode

For more information about TWILIGHT's options and instructions, see wiki or Help for more details.

cd build
./twilight -h

Default Configuration

Usage syntax

./twilight -t <path to tree file> -i <path to sequence file> -o <path to output file>

Example

./twilight -t ../dataset/RNASim.nwk -i ../dataset/RNASim.fa -o RNASim.aln

Divide-and-Conquer Method

TWILIGHT divides tree into subtrees with at most m leaves, which is specified by the user, and align subtrees sequentially to reduce the CPU’s main memory usage. Usage syntax

./twilight -t <path to tree file> -i <path to sequence file> -o <path to output file> -m <maximum subtree size>

Example

./twilight -t ../dataset/RNASim.nwk -i ../dataset/RNASim.fa -o RNASim.aln -m 200

Merge Multiple MSA Files

To merge multiple MSAs, please move the MSA files into a folder.
Usage syntax

./twilight -f <path to the folder> -o <path to output file>

Example

./twilight -f ../dataset/RNASim_subalignments/ -o RNASim.aln

Iterative Mode

TWILIGHT iterative mode provides a Snakemake workflow to estimate guide trees using external tools.

Options for tree inference tools:

  • Initial guide tree: parttree, maffttree, mashtree
  • Subsequent iterations: fasttree, iqtree, raxml

Step 1: Enter workflow directory

cd workflow

Step 2: See wiki for more details for the configurations

Step 3: Run TWILIGHT iterative mode.
Usage syntax

snakemake --cores [num threads] --config SEQ=[sequence] OUT=[output] DIR=[directoryITER=[iterations] INITTREE=[tree method] ITERTREE=[tree method] OUTTREE=[tree] GETTREE=[yes/no]

Example

  • Using default configurations
snakemake --cores 8 --config SEQ=../dataset/RNASim.fa OUT=RNASim.aln DIR=tempDir
  • Specifying all command line options
snakemake --cores 8 --config SEQ=../dataset/RNASim.fa OUT=RNASim.aln DIR=tempDir ITER=2 INITTREE=maffttree ITERTREE=raxml OUTTREE=RNASim.tree GETTREE=yes

Contributions

We welcome contributions from the community to enhance the capabilities of TWILIGHT. If you encounter any issues or have suggestions for improvement, please open an issue on TWILIGHT GitHub page. For general inquiries and support, reach out to our team.

Citing TWILIGHT

TBA.