Skip to content

Commit

Permalink
Merge pull request #150 from wells-wood-research/improving_install
Browse files Browse the repository at this point in the history
Improving install
  • Loading branch information
ChrisWellsWood authored Jun 11, 2024
2 parents 4b5d840 + e824f71 commit ddfc26d
Show file tree
Hide file tree
Showing 9 changed files with 287 additions and 63 deletions.
2 changes: 1 addition & 1 deletion .env-headless → .env-headless-testing
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ APP_PORT=8181

EVOEF2_BINARY_PATH=/dependencies_for_de-stress/EvoEF2/EvoEF2
DFIRE2_FOLDER_PATH=/dependencies_for_de-stress/DFIRE2-pair/
ROSETTA_BINARY_PATH=/dependencies_for_de-stress/rosetta_src_2020.08.61146_bundle/main/source/bin/score_jd2.linuxgccrelease
ROSETTA_BINARY_PATH=/dependencies_for_de-stress/rosetta/source/bin/score_jd2.linuxgccrelease
AGGRESCAN3D_SCRIPT_PATH=/dependencies_for_de-stress/Aggrescan3D/aggrescan3D_cli_run.py

RQ_DASHBOARD_REDIS_URL=redis://redis:6379
Expand Down
36 changes: 20 additions & 16 deletions .github/workflows/big-structure-run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,26 @@ jobs:
- name: Checkout de-stress
uses: actions/checkout@v2

- name: Checkout dependencies_for_de-stress
uses: actions/checkout@v2
with:
repository: wells-wood-research/dependencies_for_de-stress.git
token: ${{ secrets.DEPENDENCIES_ACCESS_TOKEN }}
ref: master
path: dependencies_for_de-stress

- name: Build and test `big-structure`
- name: Download dependencies
run: |
chmod -R 755 dependencies_for_de-stress/
cd big-structure/
docker build -t big-structure .
cd ..
docker run --rm -v $(pwd)/dependencies_for_de-stress/:/dependencies_for_de-stress/ \
--env-file .env-testing \
big-structure \
python -m venv headless_destress
source headless_destress/bin/activate
cp .env-headless-testing .env-headless
pip install -r requirements.txt
cd dependencies_for_de-stress/
git clone --branch v2024.18-dev62107 https://github.com/RosettaCommons/rosetta.git
echo "Rosetta has been successfully downloaded."
cd ../
docker compose -f headless-compose.yml build
current_dir=$(pwd)
echo "Current dir: $current_dir"
echo "Building DE-STRESS dependencies. Rosetta will take a few hours to compile."
docker run --rm -v /home/runner/work/de-stress/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress de-stress-big-structure:latest sh build_dependencies_tests.sh
- name: Test `big-structure`
run: |
docker run --rm -v /home/runner/work/de-stress/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress/ \
--env-file .env-headless \
de-stress-big-structure:latest \
poetry run pytest -m "not rosetta"
76 changes: 41 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,74 +30,80 @@ For more information about our research group, check out our

## Local Deployment

Make sure you have all the relevant dependencies in
`de-stress/dependencies_for_de-stress/`. Currently, these are:
DE-STRESS can be installed locally as a web server (https://pragmaticproteindesign.bio.ed.ac.uk/de-stress/) or as a command line tool (headless DE-STRESS).

The DE-STRESS webserver has a few limitations which are there to ensure the stability of the webserver. These limitations are listed below.

* Only proteins with 500 residues or less can be uploaded.
* Only 30 files can be uploaded at a time.
* There is a max run time of 20 seconds for all the DE-STRESS metrics.

The headless version of DE-STRESS can be ran using the command line interface and the user can change the settings to run DE-STRESS on a larger set of PDB files. The code has been written to allow multiprocessing so that large amounts of files can be ran in a reasonable amount of time. The .env-headless file can be used to update the MAX_RUN_TIME, HEADLESS_DESTRESS_BATCH_SIZE and HEADLESS_DESTRESS_WORKERS variables to change the amount of seconds the DE-STRESS metrics are allowed to run, how many PDB files are in a batch, and how many CPUs should be used respectively.

Before installing either of these versions of DE-STRESS, make sure you have all the relevant licenses for the dependencies in
`de-stress/dependencies_for_de-stress/`. The current dependencies used by DE-STRESS are shown below.

* Aggrescan3D
* DFIRE 2 pair
* DSSP
* EvoEF2 (source)
* Rosetta (source)

Create a `.env` file in the top level `de-stress` folder. You can copy
`de-stress/.env-testing` and update that. This
Rosetta requires a commercial licence to install. In the future, we will offer a version of DE-STRESS without Rosetta but that is not available yet.

Download `big_structure.dump` and place it in `de-stress/database`.
Also, make sure you have the most up to date version of docker and docker-compose.

Next, from within `de-stress/`, build all the containers:
## Local install of headless DE-STRESS

First create a virtual environment for running headless destress.

```bash
# use production-compose.yml if you're deploying in a production environment
docker-compose -f development-compose.yml build
python -m venv headless_destress && source headless_destress/bin/activate && pip install -r requirements.txt

```

Compile the dependencies in the container:
After this copy .env-headless-testing file to .env-headless and then you can customise the settings for running headless DE-STRESS.

```bash
docker run \
-it \
--rm \
-v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress \
de-stress_big-structure:latest \
sh build_dependencies.sh
cp .env-headless-testing .env-headless
```

This will compile the software, but the output will be stored on the host machine as a
volume is used. This means that you cannot move or delete this folder while the
application is being served or it will break.
Next, run the setup.sh bash script to install a local version of headless DE-STRESS. To begin with, this script will ask you which version of DE-STRESS you want and after selecting headless DE-STRESS it will begin the installation process. After this, it will ask you if you want to install Rosetta and whether you have a licence for this software. If yes is selected, then it will begin an automatic install of Rosetta from the git repo https://github.com/RosettaCommons/rosetta. Once this has been installed, some of the dependencies (EvoEF2 and Rosetta) will be compiled from source code. Rosetta can take a long time to compile and this script will ask you how many CPUs to use for the compilation (if using 2 CPUs the compilation of Rosetta can take around 3 hours).

Launch the application:

```bash
# Change rq-worker to however many processes you want to use for analysis
docker-compose -f development-compose.yml --env-file .env up -d --scale rq-worker=4
./setup.sh
```

Navigate to `de-stress/database` and run `import_db_dump.sh`.
Once this script has finished running, the installation of headless DE-STRESS will be complete and you can run DE-STRESS on a set of PDB files using the below python command. Change the path to the input path containing the set of PDB files.

## Headless DE-STRESS
```bash
python3 run_destress_headless.py --i /absolute/path/to/input/pdbs/
```

The DE-STRESS webserver has a few limitations which are there to ensure the stability of the webserver. These limitations are listed below.
You can change the settings in the .env-headless file to change the max run time, number of CPUs used and the batch size for the runs. Once this docker command has finished running, a CSV file called design_data.csv will be saved in the input path which contains all of the DE-STRESS metrics for the set of PDB files. In addition to this, a logging.txt file is saved in the same folder.

* Only proteins with 500 residues or less can be uploaded.
* Only 30 files can be uploaded at a time.
* There is a max run time of 20 seconds for all the DE-STRESS metrics.
## Local install of the DE-STRESS web server

The headless version of DE-STRESS can be ran locally and the user can change the settings to run a larger set of PDB files. The code has been written to allow multiprocessing so that large amounts of files can be ran in a reasonable amount of time. The `.env-headless` file can be used to update the MAX_RUN_TIME, HEADLESS_DESTRESS_WORKERS and HEADLESS_DESTRESS_BATCH_SIZE variables to change the amount of seconds the DE-STRESS metrics are allowed to run, how many PDB files are in a batch, and how many processers should be used respectively.
Firstly, download `big_structure.dump` and place it in `de-stress/database`. This is a .dump file of a PostgreSQL database that contains the pre-calculated DE-STRESS metrics for a set of structures from the Protein Data Bank (PDB). This database is used for the reference set functionality in DE-STRESS, which allows users to compare their designed proteins against a set of known proteins.

Firstly the docker image needs to be built. There is a different docker compose file called `headless-compose.yml` that needs to be used instead of the `development-compose.yml` file.
Next, copy the .env-testing file to .env and then you can customise the settings for running the webserver version of DE-STRESS.

```bash
docker compose -f headless-compose.yml build
```bash
cp .env-testing .env
```

After this, make sure the dependencies have been built. The path `/absolute/path/to/de-stress/dependencies_for_de-stress/` needs to be replaced with the user's local path to the DE-STRESS dependencies.
After this, run the setup.sh bash script to install a local version of the DE-STRESS webserver and follow the same steps as described above. This script will ask if you want to install the webserver in a development or production environment as well. Also, the settings for the DE-STRESS webserver can be changed in the .env file as well.

```bash
docker run -it --rm -v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress de-stress-big-structure:latest sh build_dependencies.sh
./setup.sh
```

Finally, run headless DE-STRESS with the following command and change the `/absolute/path/to/` to the the local file path to these folders.
Next, navigate to /de-stress/front-end and run the below command to launch the user interface for the web server. **Note npm needs to be installed locally to be able to do this.**

```bash
docker run -it --rm --env-file .env-headless -v /absolute/path/to/de-stress/dependencies_for_de-stress/:/dependencies_for_de-stress -v /absolute/path/to/input_path/:/input_path de-stress-big-structure:latest poetry run headless_destress /input_path
npm start
```

Finally, after this command has finished running, there will be a URL link that can be clicked to view the user interface for the DE-STRESS web server.

11 changes: 7 additions & 4 deletions big-structure/build_dependencies.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
cd /dependencies_for_de-stress/EvoEF2/ &&\
g++ -O3 --fast-math -o EvoEF2 src/*.cpp
cd /dependencies_for_de-stress/rosetta_src_2020.08.61146_bundle/main/source/ &&\
./scons.py -j2 mode=release bin/score_jd2.linuxgccrelease
cd /dependencies_for_de-stress/EvoEF2/ &&\
g++ -O3 --fast-math -o EvoEF2 src/*.cpp

echo "How many jobs do you want to run in order to compile Rosetta?"
read numofjobs
cd /dependencies_for_de-stress/rosetta/source/ &&\
./scons.py -j$numofjobs mode=release bin
2 changes: 2 additions & 0 deletions big-structure/build_dependencies_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cd /dependencies_for_de-stress/EvoEF2/ &&\
g++ -O3 --fast-math -o EvoEF2 src/*.cpp
23 changes: 16 additions & 7 deletions big-structure/src/destress_big_structure/console.py
Original file line number Diff line number Diff line change
Expand Up @@ -651,22 +651,31 @@ def headless_destress_batch(input_path: str) -> None:
+ str(num_pdb_files)
+ " PDB files in "
+ str(int(math.ceil(num_pdb_files / NUM_HEADLESS_DESTRESS_BATCH_SIZE)))
+ " batches."
+ " batch/batches."
)

logging.info(
"DE-STRESS will run on "
+ str(num_pdb_files)
+ " PDB files in "
+ str(int(math.ceil(num_pdb_files / NUM_HEADLESS_DESTRESS_BATCH_SIZE)))
+ " batches."
+ " batch/batches."
)

print(
"The estimated run time with >= 20 cores will be roughly "
+ str(round(num_pdb_files / 60))
+ " minutes. So relax, get a coffee and the results will be ready for you soon!"
)
if round(num_pdb_files / 60) < 120:

print(
"The estimated run time with >= 20 CPUs will be roughly "
+ str(round(num_pdb_files / 60))
+ " minutes. So relax, get a coffee and the results will be ready for you soon!"
)
elif round(num_pdb_files / 60) >= 120:

print(
"Wow that's a lot of PDB files!!! The estimated run time with >= 20 CPUs will be roughly "
+ str(round(num_pdb_files / 60))
+ " minutes. Come back later and headless DE-STRESS will have the results for you. "
)

# Initialising the mprocess pool and number of workers
with mp.Pool(processes=NUM_HEADLESS_DESTRESS_WORKERS) as process_pool:
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
argpass==0.0.2
63 changes: 63 additions & 0 deletions run_destress_headless.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import os
import os.path as p
import argpass
from subprocess import call
##############################################################################
# get inputs
def read_inputs():
## create an argpass parser, read config file,
parser = argpass.ArgumentParser()
parser.add_argument("--i")
args = parser.parse_args()

inputPath=args.i

return inputPath
##############################################################################
def main():
inputPath = read_inputs()

if not p.isabs(inputPath):
ascii_splash("ERROR")
print("Input Path must be Absolute!")
return

ascii_splash("DESTRESS")
deStressDir = os.getcwd()
dependanciesDir = p.join(deStressDir,"dependencies_for_de-stress")
call(["docker", "run", "-it", "--rm",
"--env-file", ".env-headless",
"-v", f"{dependanciesDir}:/dependencies_for_de-stress",
"-v", f"{inputPath}:/input_path",
"de-stress-big-structure:latest",
"poetry", "run", "headless_destress", "/input_path"])
##############################################################################
def ascii_splash(id):
splashDict = {"DESTRESS":
"""
██████╗ ███████╗ ███████╗████████╗██████╗ ███████╗███████╗███████╗
██╔══██╗██╔════╝ ██╔════╝╚══██╔══╝██╔══██╗██╔════╝██╔════╝██╔════╝
██║ ██║█████╗█████╗███████╗ ██║ ██████╔╝█████╗ ███████╗███████╗
██║ ██║██╔══╝╚════╝╚════██║ ██║ ██╔══██╗██╔══╝ ╚════██║╚════██║
██████╔╝███████╗ ███████║ ██║ ██║ ██║███████╗███████║███████║
╚═════╝ ╚══════╝ ╚══════╝ ╚═╝ ╚═╝ ╚═╝╚══════╝╚══════╝╚══════╝
""",

"ERROR":
"""
▓█████ ██▀███ ██▀███ ▒█████ ██▀███
▓█ ▀ ▓██ ▒ ██▒▓██ ▒ ██▒▒██▒ ██▒▓██ ▒ ██▒
▒███ ▓██ ░▄█ ▒▓██ ░▄█ ▒▒██░ ██▒▓██ ░▄█ ▒
▒▓█ ▄ ▒██▀▀█▄ ▒██▀▀█▄ ▒██ ██░▒██▀▀█▄
░▒████▒░██▓ ▒██▒░██▓ ▒██▒░ ████▓▒░░██▓ ▒██▒
░░ ▒░ ░░ ▒▓ ░▒▓░░ ▒▓ ░▒▓░░ ▒░▒░▒░ ░ ▒▓ ░▒▓░
░ ░ ░ ░▒ ░ ▒░ ░▒ ░ ▒░ ░ ▒ ▒░ ░▒ ░ ▒░
░ ░░ ░ ░░ ░ ░ ░ ░ ▒ ░░ ░
░ ░ ░ ░ ░ ░ ░
"""
}
print(splashDict[id])

##############################################################################
main()
Loading

0 comments on commit ddfc26d

Please sign in to comment.