-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #155 from PacificBiosciences/develop-v2
v2.0.3
- Loading branch information
Showing
22 changed files
with
821 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# TBD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Configuring Cromwell on Azure | ||
|
||
Workflows can be run in Azure by setting up [Cromwell on Azure (CoA)](https://github.com/microsoft/CromwellOnAzure). Documentation on deploying and configuring an instance of CoA can be found [here](https://github.com/microsoft/CromwellOnAzure/wiki/Deploy-your-instance-of-Cromwell-on-Azure). | ||
|
||
## Requirements | ||
|
||
- [Cromwell on Azure](https://github.com/microsoft/CromwellOnAzure) version 3.2+; version 4.0+ is recommended | ||
|
||
## Configuring and running the workflow | ||
|
||
### Filling out workflow inputs | ||
|
||
Fill out any information missing in [the inputs file](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/backends/azure/singleton.azure.inputs.json). | ||
|
||
See [the inputs section of the main README](./singleton#inputs) for more information on the structure of the inputs.json file. | ||
|
||
### Running via Cromwell on Azure | ||
|
||
Instructions for running a workflow from Cromwell on Azure are described in [the Cromwell on Azure documentation](https://github.com/microsoft/CromwellOnAzure/wiki/Running-Workflows). | ||
|
||
## Reference data hosted in Azure | ||
|
||
To use Azure reference data, add the following line to your `containers-to-mount` file in your Cromwell on Azure installation ([more info here](https://github.com/microsoft/CromwellOnAzure/blob/develop/docs/troubleshooting-guide.md#use-input-data-files-from-an-existing-azure-storage-account-that-my-lab-or-team-is-currently-using)): | ||
|
||
`https://datasetpbrarediseases.blob.core.windows.net/dataset?si=public&spr=https&sv=2021-06-08&sr=c&sig=o6OkcqWWlGcGOOr8I8gCA%2BJwlpA%2FYsRz0DMB8CCtCJk%3D` | ||
|
||
The [Azure input file template](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/backends/azure/singleton.azure.inputs.json) has paths to the reference files in this blob storage prefilled. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Configuring Cromwell on GCP | ||
|
||
[Cromwell's documentation](https://cromwell.readthedocs.io/en/stable/tutorials/PipelinesApi101/) on getting started with Google's genomics Pipelines API can be used to set up the resources needed to run the workflow. | ||
|
||
## Configuring and running the workflow | ||
|
||
### Filling out workflow inputs | ||
|
||
Fill out any information missing in [the inputs file](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/backends/gcp/singleton.gcp.inputs.json). | ||
|
||
See [the inputs section of the singleton README](./singleton#inputs) for more information on the structure of the inputs.json file. | ||
|
||
#### Determining available zones | ||
|
||
To determine available zones in GCP, run the following; available zones within a region can be found in the first column of the output: | ||
|
||
```bash | ||
gcloud compute zones list | grep <region> | ||
``` | ||
|
||
For example, the zones in region `us-central1` are `"us-central1-a us-central1-b us-central1c us-central1f"`. | ||
|
||
## Running the workflow via Google's genomics Pipelines API | ||
|
||
[Cromwell's documentation](https://cromwell.readthedocs.io/en/stable/tutorials/PipelinesApi101/) on getting started with Google's genomics Pipelines API can be used as an example for how to run the workflow. | ||
|
||
## Reference data hosted in GCP | ||
|
||
GCP reference data is hosted in the `us-west1` region in the bucket `gs://pacbio-wdl`. This bucket is requester-pays, meaning that users will need to [provide a billing project in their Cromwell configuration](https://cromwell.readthedocs.io/en/stable/filesystems/GoogleCloudStorage/) in order to use files located in this bucket. | ||
|
||
To avoid egress charges, Cromwell should be set up to spin up compute resources in the same region in which the data is located. If possible, add cohort data to the same region as the reference dataset, or consider mirroring this dataset in the region where your data is located. See [Google's information about data storage and egress charges for more information](https://cloud.google.com/storage/pricing). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Installing and configuring for HPC backends | ||
|
||
Either `miniwdl` or `Cromwell` can be used to run workflows on the HPC. | ||
|
||
## Installing and configuring `miniwdl` | ||
|
||
### Requirements | ||
|
||
- [`miniwdl`](https://github.com/chanzuckerberg/miniwdl) >= 1.9.0 | ||
- [`miniwdl-slurm`](https://github.com/miniwdl-ext/miniwdl-slurm) | ||
|
||
### Configuration | ||
|
||
An [example miniwdl.cfg file](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/backends/hpc/miniwdl.cfg) is provided here. This should be placed at `~/.config/miniwdl.cfg` and edited to match your slurm configuration. This allows running workflows using a basic SLURM setup. | ||
|
||
## Installing and configuring `Cromwell` | ||
|
||
Cromwell supports a number of different HPC backends; see [Cromwell's documentation](https://cromwell.readthedocs.io/en/stable/backends/HPC/) for more information on configuring each of the backends. Cromwell can be used in a standalone "run" mode, or in "server" mode to allow for multiple users to submit workflows. In the example below, we provide example commands for running Cromwell in "run" mode. | ||
|
||
## Running the workflow | ||
|
||
### Filling out workflow inputs | ||
|
||
Fill out any information missing in [the inputs file](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/backends/hpc/singleton.hpc.inputs.json). Once you have downloaded the reference data bundle, ensure that you have replaced the `<local_path_prefix>` in the input template file with the local path to the reference datasets on your HPC. | ||
|
||
See [the inputs section of the singleton README](./singleton#inputs) for more information on the structure of the inputs.json file. | ||
|
||
#### Running via miniwdl | ||
|
||
```bash | ||
miniwdl run workflows/singleton.wdl --input <inputs_json_file> | ||
``` | ||
|
||
#### Running via Cromwell | ||
|
||
```bash | ||
cromwell run workflows/singleton.wdl --input <inputs_json_file> | ||
``` | ||
|
||
## Reference data bundle | ||
|
||
[<img src="https://zenodo.org/badge/DOI/10.5281/zenodo.14027047.svg" alt="10.5281/zenodo.14027047">](https://zenodo.org/records/14027047) | ||
|
||
Reference data is hosted on Zenodo at [10.5281/zenodo.14027047](https://zenodo.org/record/14027047). Download the reference data bundle and extract it to a location on your HPC, then update the input template file with the path to the reference data. | ||
|
||
```bash | ||
## download the reference data bundle | ||
wget https://zenodo.org/record/14027047/files/hifi-wdl-resources-v2.0.0.tar | ||
|
||
## extract the reference data bundle and rename as dataset | ||
tar -xvf hifi-wdl-resources-v2.0.0.tar | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
- [hpc](./backend-hpc) | ||
- [azure](./backend-azure) | ||
- [gcp](./backend-gcp) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# bam_stats outputs | ||
|
||
```wdl | ||
{sample}.{movie}.read_length_and_quality.tsv.gz - per read length and quality metrics | ||
``` | ||
|
||
## `{sample}.{movie}.read_length_and_quality.tsv.gz` - per read length and quality metrics | ||
|
||
Base metrics are extracted for each read from the uBAM and stored in these 4 columns: | ||
|
||
- movie | ||
- read name | ||
- read length: length of query sequence | ||
- read quality: transformation of `rq` tag into Phred (log) space, e.g., `rq:f:0.99` (99% accuracy, 1 error in 100 bases) is Phred 20 ($-10 \times \log(1 - 0.99)$); this value is capped at Phred 60 for `rq:f:1.0` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# DeepVariant subworkflow | ||
|
||
```mermaid | ||
flowchart TD | ||
aBAM[/"HiFi aBAM"/] --> make_examples["DeepVariant make_examples"] | ||
make_examples --> gpu{"gpu?"} | ||
gpu -- yes --> call_variants_gpu["DeepVariant call_variants_gpu"] | ||
gpu -- no --> call_variants_cpu["DeepVariant call_variants_cpu"] | ||
call_variants_gpu --> postprocess_variants["DeepVariant postprocess_variants"] | ||
call_variants_cpu --> postprocess_variants | ||
postprocess_variants --> vcf[/"small variant VCF"/] | ||
postprocess_variants --> gvcf[/"small variant gVCF"/] | ||
``` | ||
|
||
This subworkflow runs the three steps of DeepVariant individually in order to make best use of resources. If a GPU is available and `gpu==true`, the `call_variants` step will run on 1 GPU and 8 cpu threads, otherwise it will run on 64 CPU threads. The `make_examples` and `postprocess_variants` steps will always run on the CPU. |
Oops, something went wrong.