Skip to content
Avi Srivastava edited this page Aug 23, 2016 · 4 revisions

Example Pipeline

Requirements:

To run RapClust pipeline we need to have the following information beforehand:

1. RNA-seq reads of the experiment in two different conditions and possibly multiple replicates.


**2.** *de novo* assembly (set of contigs) of the RNA-seq reads. Assembly can be performed using trinity which can be found [here](https://github.com/trinityrnaseq/trinityrnaseq/wiki).  
~~~**Note**: Input assembly can be from any standard assembler, trinity is used just as an example here.

**3.** Quantification of the RNA-seq reads separately in two different conditions using the above set of contigs as the reference.  
~~~**Note**: Currently we only support [Sailfish](https://github.com/kingsfordgroup/sailfish)/[Salmon](https://github.com/COMBINE-lab/salmon).

**4.** RapClust source code/binary can be found [here](https://github.com/COMBINE-lab/RapClust).


## Pipeline:

### 1. *de novo* assembly:
`Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G`
* output would be available as Trinity.Fasta (i.e. the set of contigs).
* If you face problem in this step, some tips are available [here](https://github.com/Oshlack/Corset/wiki/Example#perform-the-de-novo-assembly) or raise issue [here](https://github.com/trinityrnaseq/trinityrnaseq).

### 2. Quantification:
Here we can use either Sailfish/Salmon, example below is for Sailfish.

* Clone and build Sailfish:  
`git clone https://github.com/kingsfordgroup/sailfish.git`  
`cd sailfish && mkdir build && cd build`  
`cmake .. && make`

* Make index for the reference (i.e. the set of contigs in our case):  
`sailfish index -t <ref_transcripts>/Trinity.fa -o <out_dir>/index -k <kmer_len>/31`

* Quantify reads:  
Based on the number of replicates in each condition we have to run sailfish multiple times, our example assumes two conditions(**A** and **B**) with three replicates(**1**, **2**, **3**) in each:  
`parallel -j 6 "samp={}; sailfish quant -i index -l IU -1 <(gunzip -c reads/{$samp}_1.fq.gz) -2 <(gunzip -c reads/{$samp}_2.fq.gz) -o {$samp}_quant --dumpEq -p 4" ::: A1 A2 B1 B2 C1 C2`

### 3. Clustering:
~~~Note: A detailed explanation of this step can also be found [here](https://github.com/keyavi/RapClust/tree/master#using-rapclust).  
* If you have conda than RapClust can be installed directly from the cloud without any concern for the dependencies.  
`conda create --name rapclust_env python=3`  
`source activate rapclust_env `  
`conda install rapclust`
* [optional] Below command can be used to install miniconda if conda was not available.  
`wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh`  
`bash Miniconda3-latest-Linux-x86_64.sh`

* Make configuration file:  
Make a file with extension **.yaml** with following mandatory fields:  
```
conditions:
    - A
    - B
samples:
    A:
        - A1_quant
        - A2_quant
        - A3_quant
    B:
        - B1_quant
        - B2_quant
        - B3_quant
outdir: <output_dir>/human_rapclust
```
* Run RapClust  
`RapClust --config <Name_of_file>.yaml`
Clone this wiki locally