We are aiming to have a consistent method of comparison between gene, transcript and exon concatenation per TF for stage 2 training. As a result, we chose the RSEM (as recommended by ENCODE) to generate the expression levels per gene and transcript. However, the RSEM output does not include explicit exon level quantification for the RNA experiment. RSEM outputs a genome based bam file and a transcript based bam file. -- The idea is to use transcript based bam file to compute the weighted counts per exon.
We use the transcript BAM file output by RSEM (rsem.transcript.bam) to generate exon level quantifications.
- Bioinformatics:
Using the gtf file and the transcript rsem bam file generated by RSEM for each RNA experiment, we will generate the weighted exon counts -- "expected counts per exon"
-
Look at reads on forward strand
samtools view -h -F 20 rsem.transcript.bam | samtools view -S -b > rsem.transcript.forward.bam
-
Convert bam to bed file we used bedops in the script bam2bed.sh
bam2bed < rsem.transcript.forward.bam > rsem.transcript.forward.bed
-
bedtools intersect -a $exonFile -b $bedFile -wb -wa -sorted
-
Get weighted count quantifyExons.py