-
Notifications
You must be signed in to change notification settings - Fork 0
Lab: Quality Assessment
All raw data will be located in /pickett_shared/teaching/EPP575_Jan2022/raw_data/
.
To confirm, run:
ls /pickett_shared/teaching/EPP575_Jan2022/raw_data/
You should see 8 files. Does that mean we have 8 samples?
These files are big, and copying each file will use up too much memory for our system. Rather than copying files to your directory, I recommend creating a symbolic link.
Navigate to /pickett_shared/teaching/EPP575_Jan2022/analysis
, and create a directory with your UTK user name; this is where you will store your output files.
mkdir <your_username>
cd <your_username>
Within this new directory, create a sub-directory named raw_data
. Within this directory, run the command:
ln -s /pickett_shared/teaching/EPP575_Jan2022/raw_data/SRR17062759_1.fastq
This creates a symbolic link to the file; rather than creating a hard duplicate, this command creates a different type of file that points to the original file.
Navigate back up to your main directory, and create a new sub-directory named analysis
. Within this directory, create a sub-directory to hold the first step of our analysis:
mkdir 1_fastqcRaw
cd 1_fastqcRaw
FastQC is not available by default on Sphinx; load it with the following command:
spack load fastqc@0.11.9%gcc@8.4.1
This is an alternative to the way Meg discussed loading Spack packages on Friday - in this case, it woudl look like spack load /wrz2q7j
; use whichever method you prefer.
Test that fastqc loaded properly for you. What message pops up if you just run fastqc
? How about fastqc -h
?
To run fastqc on your data, run the following:
mkdir SRR17062759_1.fastQC
fastqc -o SRR17062759_1.fastQC ../../raw_data/SRR17062759_1.fastq >& SRR17062759_1.fastQC.out
This creates an HTML file that is unable to be viewed on Terminal. Using the scp
command, copy this file to your personal computer to open the HTML file for viewing.
scp <your_username>@sphinx.ag.utk.edu:/pickett_shared/teaching/EPP575_Jan2022/analysis/<your_username>/analysis/1_fastqcRaw/SRR17062759_1.fastQC/SRR17062759_1_fastqc.html .
We have performed quality assessment on one read pair file for sample SRR17062759. Repeat this for the second read pair file.
Once you have both FastQC html files, we can run MultiQC to aggregate our results. Load it with the following command:
spack load py-multiqc@1.7%gcc@8.4.1
In the same directory you ran FastQC, run the following command:
multiqc .
What is the importance of the .
in this command?
Once it has finished running, you will have a file in your 1_fastqcRaw
directory named multiqc_report.html
. This is the default file name of every run of MultiQC; to avoid overwriting older MultiQC reports, I recommend renaming the file:
mv multiqc_report.html EPP575_raw_multiqc_report.html
Send the file labeled EPP575_raw_multiqc_report.html
to mstaton1@utk.edu and mhuff10@utk.edu. This file must contain quality assessment information of both read pairs for sample SRR17062759
.