-
Notifications
You must be signed in to change notification settings - Fork 21
Installation
We recommend to use virtualenv when possible, especially when dealing distributed systems such as SLURM.
So if you are using conda, make sure to exit your environment via conda deactivate
. While you could install DeepBLAST / TMvec within a conda environment, we have run into problems when running distributed training, so use at your own risk.
You can create a new virtual environment via
python3 -m venv tmvec
This will create a folder called tmvec
, and all install scripts will be installed there. The install script location can be placed anywhere on your system (which is very useful on distributed systems). You can activate your environment via source tmvec/bin/activate
Once the virtualenv is created, you can activate your environment and install everything as follows.
For more details on pytorch versions, see the pytorch instructions
You will then need to install faiss via pip install faiss-cpu
Then, the latest versions of DeepBLAST and TMvec can be installed as follows.
pip install tm-vec
To install the development versions, run the following commands
pip install git+https://github.com/tymor22/tm-vec.git
Because DeepBLAST is a dependency of TM-vec, installing TM-vec will automatically install DeepBLAST.
If you have a GPU available, you can take advantage of accelerated database building, search and alignment.
This can be done as follows (with cuda=11.6). You change the URL below to reflect your cuda toolkit version (cu118 for cuda 11.8, cu121 for cuda 12.1). Don't supply a number greater than your installed cuda toolkit version though!
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
pip3 install faiss-gpu
Then DeepBLAST / TM-vec can both be installed via pip install tm-vec
For more information on other cuda versions, see the pytorch installation documentation.
DeepBLAST does use Numba in order to compute alignments on the GPU. As a result, Numba can be finicky regarding the GPU setup.
Sometimes, it is sufficient to use the locally installed cudatoolkit. If your compute cluster has it installed, it may be a matter of loading the modules. The command that we used on our slurm cluster was module load gcc cudnn cuda
, but this may vary depending on the cluster.
If you are installing this on your local machine, you may need to rewire some paths, namely manually installing nvidia drivers and the cudatoolkit. On Ubuntu, the cuda-toolkit can be installed via
sudo apt-get install nvidia-cuda-toolkit
However, this isn't enough, since Numba has default paths defined for searching for the cuda-toolkit, so you may need to override CUDA_HOME
.
For instance, on my Ubuntu machine, I ran
export CUDA_HOME=/usr/local/cuda-11.3
We have only tested DeepBLAST / TM-vec on Linux machines, so there is no guarantee that it will work with Windows / Mac. Furthermore, these models are large and will require >12GB of GPU memory (we've tested it with 24GB-80GB of RAM for training and inference). If you don't have these types of GPUs, you can still run DeepBLAST / TM-vec on the CPU, but expect a >10x increase in runtime.