STRPsearch is a specialized tool designed for rapid and precise identification and mapping of structured tandem repeats in proteins (STRPs).
To get started with the project, first, extract the contents of data/databases.zip
by running the following command:
cd data && unzip databases.zip && cd ..
Then you can choose one of the following methods to set up the software:
- Install all the dependencies listed in the
requirements.txt
file:
pip install -r requirements.txt
Note: Inside the requirements.txt file, you'll find a commented section that includes dependencies which cannot be installed with pip. To install these dependencies, you can use Conda by running the following commands:
conda install -c conda-forge -c bioconda foldseek
conda install -c bioconda tmalign
conda install -c conda-forge pymol-open-source
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
- Import and activate the Conda environment from the
environment.yml
file:
conda env create -f environment.yml
conda activate strpsearch_env
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
- Build the Docker image using the provided
Dockerfile
:
docker build -t strpsearch .
- To run the container in an interactive mode, use the following command:
docker run -it --entrypoint /bin/bash -v /mount/directory/:/app strpsearch
Be aware that -v /mount/directory/:/app
command mounts the specified directory (/mount/directory/
) to the working directory of the container. This ables the container to read and write files on the host machine.
- Navigate to the main directory of the project and run the software with the following command:
python3 ./bin/strpsearch.py [OPTIONS] COMMAND [ARGS]...
The tools has three Commands, each with its positional arguments and options.
To list the available commands run:
python3 bin/strpsearch.py --help
Which returns the following commands:
Command | Description |
---|---|
query-file |
Query an existing PDB/CIF formatted structure file by providing the file path |
download-pdb |
Download and query a structure from PDB by providing the PDB ID and the specific Chain of interest |
download-model |
Download and query an AlphaFold model by providing the UniProt ID and the AlphaFold version of interest |
version |
Show the version and exit |
input_file
(TEXT): Path to the input structure file to query (PDB/mmCIF). This argument is required. Default: Noneout_dir
(TEXT): Path to the output directory. This argument is required. Default: None
--chain
(TEXT): Specific chain to query from the structures. Default: all--temp-dir
(TEXT): Path to the temporary directory. Default: /tmp--max-eval
(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height
(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp
: Whether to keep the temporary directory and files. Default: no-keep-temp--help
: Show this message and exit
pdb_id
(TEXT): PDB ID of the experimental structure to download and query. This argument is required. Default: Noneout_dir
(TEXT): Path to the output directory. This argument is required. Default: None
--chain
(TEXT): Specific chain to query from the structures. Default: all--temp-dir
(TEXT): Path to the temporary directory. Default: /tmp--max-eval
(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height
(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp
: Whether to keep the temporary directory and files. Default: no-keep-temp--help
: Show this message and exit
uniprot_id
(TEXT): UniProt ID of the AlphaFold-predicted model to download and query. This argument is required. Default: Noneaf_version
(TEXT): Version of AlphaFold to download predicted models from. This argument is required. Default: Noneout_dir
(TEXT): Path to the output directory. This argument is required. Default: None
--temp-dir
(TEXT): Path to the temporary directory. Default: /tmp--max-eval
(FLOAT): Maximum E-value of the targets to prefilter. Default: 0.01--min-height
(FLOAT): Minimum height of TM-score signals to be processed. Default: 0.4--keep-temp / --no-keep-temp
: Whether to keep the temporary directory and files. Default: no-keep-temp--help
: Show this message and exit
If you already have a PDB/CIF formatted structure file, and you want to query all the chains in the structure, keeping temporary directory and files:
python3 ./bin/strpsearch.py query-file /input/file /output/directory --keep-temp
If you want to automatically download and query a specific experimental structure from PDB (e.g. chain B of PDB structure 1A0R), without keeping temporary directory and files:
python3 ./bin/strpsearch.py download-pdb 1a0r /output/directory --chain B
If you want to automatically download and query a predicted-model from AlphaFold (e.g. UniProt ID: Q9HXJ7)
python3 ./bin/strpsearch.py download-model Q9HXJ7 /output/directory