Skip to content

Commit

Permalink
Merge pull request #1 from chill868686/chill868686-patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
chill868686 authored Aug 25, 2022
2 parents f1a99a1 + 8227ced commit fd780af
Showing 1 changed file with 116 additions and 2 deletions.
118 changes: 116 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,116 @@
# adaptive-coder
DNA storage adaptive coder using neural network
# Adaptive Coder

Transform digital data to ATCG sequences for DNA storage in high logical density,
while output sequences comply with arbitrary user-defined constraints.


## First time setup

The following steps are required in order to run Adaptive Coder:

1. Install [Docker](https://www.docker.com/).
* Install
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
for GPU support.
* Setup running
[Docker as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user).
1. Check GPUs are avaliable by running:

```bash
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```

The output of this command should show a list of your GPUs.

## Running Adaptive Coder

**The simplest way to run Adaptive Coder is using the provided Docker script.** This
was tested with 20 vCPUs, 64 GB of RAM, and a 3090 GPU.

1. Clone this repository and `cd` into it.

```bash
git clone https://github.com/chill868686/adaptive-coder.git
```

1. Build the Docker image:

```bash
docker build -f docker/Dockerfile -t adacoder .
```

1. Install the `run_docker.py` dependencies. Note: You may optionally wish to
create a
[Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html)
to prevent conflicts with your system's Python environment.
```bash
pip3 install -r docker/requirements.txt
```
1. Make sure that the output directory exists (the default is `/tmp/alphafold`)
and that you have sufficient permissions to write into it. You can make sure
that is the case by manually running `mkdir /tmp/alphafold` and
`chmod 770 /tmp/alphafold`.
1. Run `run_docker.py` pointing to a FASTA file containing the protein
sequence(s) for which you wish to predict the structure. If you are
predicting the structure of a protein that is already in PDB and you wish to
avoid using it as a template, then `max_template_date` must be set to be
before the release date of the structure. You must also provide the path to
the directory containing the downloaded databases. For example, for the
T1050 CASP14 target:
```bash
python3 docker/run_docker.py \
--fasta_paths=T1050.fasta \
--max_template_date=2020-05-14 \
--data_dir=$DOWNLOAD_DIR
```
By default, Alphafold will attempt to use all visible GPU devices. To use a
subset, specify a comma-separated list of GPU UUID(s) or index(es) using the
`--gpu_devices` flag. See
[GPU enumeration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration)
for more details.
1. You can control which AlphaFold model to run by adding the
`--model_preset=` flag. We provide the following models:
* **monomer**: This is the original model used at CASP14 with no ensembling.
* **monomer\_casp14**: This is the original model used at CASP14 with
`num_ensemble=8`, matching our CASP14 configuration. This is largely
provided for reproducibility as it is 8x more computationally
expensive for limited accuracy gain (+0.1 average GDT gain on CASP14
domains).
* **monomer\_ptm**: This is the original CASP14 model fine tuned with the
pTM head, providing a pairwise confidence measure. It is slightly less
accurate than the normal monomer model.
* **multimer**: This is the [AlphaFold-Multimer](#citing-this-work) model.
To use this model, provide a multi-sequence FASTA file. In addition, the
UniProt database should have been downloaded.
1. You can control MSA speed/quality tradeoff by adding
`--db_preset=reduced_dbs` or `--db_preset=full_dbs` to the run command. We
provide the following presets:
* **reduced\_dbs**: This preset is optimized for speed and lower hardware
requirements. It runs with a reduced version of the BFD database.
It requires 8 CPU cores (vCPUs), 8 GB of RAM, and 600 GB of disk space.
* **full\_dbs**: This runs with all genetic databases used at CASP14.
Running the command above with the `monomer` model preset and the
`reduced_dbs` data preset would look like this:
```bash
python3 docker/run_docker.py \
--fasta_paths=T1050.fasta \
--max_template_date=2020-05-14 \
--model_preset=monomer \
--db_preset=reduced_dbs \
--data_dir=$DOWNLOAD_DIR
```

0 comments on commit fd780af

Please sign in to comment.