Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style: run pre-commit on all files #100

Merged
merged 5 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,4 @@ site/

# other

**/.koparde*
**/.koparde*
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## RENEE development version

- Minor documentation improvements. (#100, @kelly-sovacool)

## RENEE 2.5.11

- Create a citation file to describe how to cite RENEE. (#86, @kelly-sovacool)
Expand Down
131 changes: 68 additions & 63 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion config/genomes/biowulf/hg19_19.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_hg19_hs37d5_GRCh37_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/genomes/biowulf/hg38_30.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_hg38_GRCh38_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/genomes/biowulf/hg38_34.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_hg38_GRCh38_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/genomes/biowulf/hg38_36.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_hg38_GRCh38_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/genomes/biowulf/hg38_38.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_hg38_GRCh38_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/genomes/biowulf/mm10_M23.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
"FUSIONPROTDOMAIN": "s3://nciccbr/Resources/RNA-seq/arriba/protein_domains_mm10_GRCm38_v2.0.0.gff3"
}
}
}
}
2 changes: 1 addition & 1 deletion config/templates/tools.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,4 @@
"WIGTYPE": "None"
}
}
}
}
16 changes: 8 additions & 8 deletions docker/multiqc/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,19 @@ LABEL maintainer=kuhnsa@nih.gov
# - matplotlib (pypi)
# - XlsxWriter (pypi)

# Create Container filesystem specific
# Create Container filesystem specific
# working directory and opt directories
# to avoid collisions with host filesyetem
# to avoid collisions with host filesyetem
RUN mkdir -p /opt2 && mkdir -p /data2
WORKDIR /opt2
WORKDIR /opt2

# Set time zone to US east coast
# Set time zone to US east coast
ENV TZ=America/New_York
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
&& echo $TZ > /etc/timezone

# This section installs system packages
# required for your project. If you need
# This section installs system packages
# required for your project. If you need
# extra system packages add them here.
# Installs python/3.8.10
RUN apt-get update \
Expand Down Expand Up @@ -55,6 +55,6 @@ RUN pip3 install --upgrade pip \

# Add Dockerfile and export env variables
ADD Dockerfile /opt2/Dockerfile
RUN chmod -R a+rX /opt2
RUN chmod -R a+rX /opt2
ENV PATH="/opt2:$PATH"
WORKDIR /data2
WORKDIR /data2
120 changes: 60 additions & 60 deletions docs/RNA-seq/Resources.md

Large diffs are not rendered by default.

55 changes: 29 additions & 26 deletions docs/RNA-seq/TLDR-RNA-seq.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,53 @@
## 1. Introduction

When processing RNA-sequencing data, there are often many steps that we must repeat. These are usually steps like removing adapter sequences, aligning reads against a reference genome, checking the quality of the data, and quantifying counts. RENEE is composed of several sub commands or convience functions to automate these repetitive steps.
When processing RNA-sequencing data, there are often many steps that we must repeat. These are usually steps like removing adapter sequences, aligning reads against a reference genome, checking the quality of the data, and quantifying counts. RENEE is composed of several sub commands or convenience functions to automate these repetitive steps.

With RENEE, you can run your samples through our highly-reproducible pipeline, build resources for new reference genomes, and more!

Here is a list of available renee `sub commands`:

- [**`run`**](../run): run the rna-seq pipeline
- [**`build`**](../build): build reference files
- [**`cache`**](../cache): cache remote resources locally
- [**`unlock`**](../unlock): unlock a working directory

> This page contains information for building reference files and running the RENEE pipeline. For more information about each of the available sub commands, please see the [usage section](./run.md).
- [**`run`**](../run): run the rna-seq pipeline
- [**`build`**](../build): build reference files
- [**`cache`**](../cache): cache remote resources locally
- [**`unlock`**](../unlock): unlock a working directory

> This page contains information for building reference files and running the RENEE pipeline. For more information about each of the available sub commands, please see the [usage section](./run.md).

## 2. Setup RENEE

_Estimated Reading Time: 3 Mintutes_

RENEE has two dependencies: `singularity` and `snakemake`. These dependencies can be installed by a sysadmin; however, snakemake is readily available through conda. Before running the pipeline or any of the commands below, please ensure singularity and snakemake are in your `$PATH`. Please see follow the instructions below for getting started with the RENEE pipeline.

### 2.1 Login to cluster

```bash
# Setup Step 0.) ssh into cluster's head node
# example below for Biowulf cluster
ssh -Y $USER@biowulf.nih.gov
```


### 2.2 Grab an interactive node
```bash

```bash
# Setup Step 1.) Please do not run RENEE on the head node!
# Grab an interactive node first
srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
```

### 2.3 Load dependecies
```bash
### 2.3 Load dependencies

```bash
# Setup Step 2.) Add singularity and snakemake executables to $PATH
module purge
module load ccbrpipeliner
```

## 3. Building Reference files

In this example, we will start off by building reference files downloaded from [GENCODE](https://www.gencodegenes.org/). We recommend downloading the `PRI` Genome FASTA file and annotation from [GENCODE](https://www.gencodegenes.org/). These `PRI` reference files contain the primary chromosomes and scaffolds. We **do not** recommend downloading the `CHR` reference files!
In this example, we will start off by building reference files downloaded from [GENCODE](https://www.gencodegenes.org/). We recommend downloading the `PRI` Genome FASTA file and annotation from [GENCODE](https://www.gencodegenes.org/). These `PRI` reference files contain the primary chromosomes and scaffolds. We **do not** recommend downloading the `CHR` reference files!

Checkout [this](./Resources.md) list for currently avaiable resources on Biowulf. If your required **genome + annotation combination** is NOT available, only then proceed to building your own reference files. Also, if you think that your **genome + annotation combination** may be beneficial for other Biowulf users of RENEE as well, then please request it to be added to RENEE's default resources by [opening an issue on Github](https://github.com/CCBR/RENEE/issues).
Checkout [this](./Resources.md) list for currently available resources on Biowulf. If your required **genome + annotation combination** is NOT available, only then proceed to building your own reference files. Also, if you think that your **genome + annotation combination** may be beneficial for other Biowulf users of RENEE as well, then please request it to be added to RENEE's default resources by [opening an issue on Github](https://github.com/CCBR/RENEE/issues).

### 3.1 Download References from GENCODE

Expand All @@ -63,7 +66,8 @@ wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.
gzip -d gencode.v36.primary_assembly.annotation.gtf.gz
```

### 3.2 Run Build pipeline
### 3.2 Run Build pipeline

```bash
# Build Step 3.) Load dependencies
module purge
Expand All @@ -80,19 +84,18 @@ renee build --ref-fa GRCh38.primary_assembly.genome.fa \
renee build --ref-fa GRCh38.primary_assembly.genome.fa \
--ref-name hg38 \
--ref-gtf gencode.v36.primary_assembly.annotation.gtf \
--gtf-ver 36 --output /data/$USER/hg38_36
--gtf-ver 36 --output /data/$USER/hg38_36
```

An email notification will be sent out when the pipeline starts and ends. Once the build pipeline completes, you can run RENEE with the provided test dataset. Please see the intructions below for more information.

## 4. Running RENEE
An email notification will be sent out when the pipeline starts and ends. Once the build pipeline completes, you can run RENEE with the provided test dataset. Please see the instructions below for more information.

Run RENEE with the reference files we built above using hg38 (GRCh38.p13) Genome FASTA file and GENCODE release 36 annotation (GTF). For more information about how the reference files we generated, please see the intructions above. You can use those instructions as a guide for building any new reference genomes in the future.
## 4. Running RENEE

Run RENEE with the reference files we built above using hg38 (GRCh38.p13) Genome FASTA file and GENCODE release 36 annotation (GTF). For more information about how the reference files we generated, please see the instructions above. You can use those instructions as a guide for building any new reference genomes in the future.

### 4.1 Dry-run pipeline
### 4.1 Dry-run pipeline

Dry-run the pipeline prior to submiting the pipeline's master job. Please note that if you wish to run RENEE with a new dataset, you will only need to update the values provided to the `--input` and `--output` arguments (and maybe `--genome`). The `--input` argument supports globbing. If this is the first time running RENEE with for given dataset, the `--output` directory should _**not**_ exist on your local filesystem. It will be created automatically during runtime.
Dry-run the pipeline prior to submitting the pipeline's master job. Please note that if you wish to run RENEE with a new dataset, you will only need to update the values provided to the `--input` and `--output` arguments (and maybe `--genome`). The `--input` argument supports globbing. If this is the first time running RENEE with for given dataset, the `--output` directory should _**not**_ exist on your local filesystem. It will be created automatically during runtime.

```bash
# Run Step 0.) Please do not run RENEE on the head node!
Expand All @@ -106,7 +109,7 @@ module load ccbrpipeliner

# Run Step 2.) Dry-run the pipeline with test dataset
# And reference genome generated in the steps above
# Test data consists of sub sampled FastQ files
# Test data consists of sub sampled FastQ files
renee run \
--input ${RENEE_HOME}/.tests/*.R?.fastq.gz \
--output /data/${USER}/runner_hg38_36/ \
Expand All @@ -116,13 +119,13 @@ renee run \
--dry-run
```

### 4.2 Run pipeline
### 4.2 Run pipeline

Kick off the pipeline by submiting the master job to the cluster. It is essentially the same command above without the `--dry-run` flag.
Kick off the pipeline by submitting the master job to the cluster. It is essentially the same command above without the `--dry-run` flag.

```bash
# Run Step 3.) Submit the master job
# Runs the RENEE pipeline with the
# Runs the RENEE pipeline with the
# reference genome generated in the steps above
# and with the test dataset
renee run \
Expand All @@ -134,4 +137,4 @@ renee run \
--dry-run
```

An email notification will be sent out when the pipeline starts and ends.
An email notification will be sent out when the pipeline starts and ends.
Loading
Loading