miseq report #92

necrolyte2 · 2016-03-02T18:43:16Z

This is a bit retroactive at this point.
It was just a small side project, but seems to be quite nice

So essentially we need an easy way to visualize an entire MiSeq run.
That is, easily point out samples that have issues such as:

Too many reads assigned
Too few reads assigned
Low quality
Forward/Reverse read count that don't match well
fastqc run on all samples for other info

So my initial idea was to grab the following data for each sample:

Samplename
Total Reads (F+R)
F Reads
Avg F Qual
Avg F Length
F bases
R Reads
Avg R Qual
Avg R Length
R Bases

Once those stats were generated I looked at them in excel and noticed it would be really nice to color cells in the matrix that were outside of STDEV

So I colored them based on 6 criteria

+1, +2, +3 and -1, -2, -3 STDEV from the mean in each column
Each stddev would get slightly more bold color gradient(green for above, red for below)

The end result will be

single csv file with base stats as listed above
single html file that contains the colored matrix as the prototype excel file had
- html file would contain links to fastqc for R1 and R2 reads

necrolyte2 · 2016-03-02T18:45:05Z

Improvements:

Utilize numpy/pandas more for faster computation
Utilize jquery/d3 to make html report look even better and interactable
Show Mean/stdev at top of each column for reference
Missing legend for colors
Does not detect if all samples have 0 reads
Logging level is set to debug which spits out all debug from sh module
Very similar data yields small stddev which means highlighted data that probably should not be

averagehat · 2016-03-14T18:04:58Z

If you do refactor might want to look into this: http://pandas.pydata.org/pandas-docs/version/0.17.1/whatsnew.html#conditional-html-formatting

necrolyte2 · 2016-03-23T22:26:03Z

What do you think about not including undetermined reads when calculating stats?
I feel like they skew the mean/stddev.

averagehat · 2016-03-23T22:50:56Z

"[undetermined reads are] Reads that the miseq index did not match to anyhing. Essentially each sample is defined by 2 adapter indexes. If a read doesn't match any then goes to undetermined"
Yes I would just drop those

necrolyte2 · 2016-03-23T23:06:20Z

Just to be clear, I think they are good to have in the report, but not part of the calculation to determine mean/stddev.

Then can color them same as the rest of the reads. The reasoning is that way people can see if Undetermined ended up with an abnormal amount of reads(like 99% of reads or something weird showing that the run failed)

averagehat · 2016-03-24T00:13:31Z

Any thoughts on what kind of interactivity you would want?

necrolyte2 · 2016-03-24T00:15:17Z

I think sorting on columns maybe is it. Let's just leave it non interactive at first and the user can request later

necrolyte2 self-assigned this Mar 2, 2016

necrolyte2 modified the milestone: MiSeq Report Mar 2, 2016

necrolyte2 added in progress ready and removed in progress ready labels Apr 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miseq report #92

miseq report #92

necrolyte2 commented Mar 2, 2016

Forward/Reverse read count that don't match well

Total Reads (F+R)

F Reads

F bases

R Reads

R Bases

necrolyte2 commented Mar 2, 2016

averagehat commented Mar 14, 2016

necrolyte2 commented Mar 23, 2016

averagehat commented Mar 23, 2016

necrolyte2 commented Mar 23, 2016

averagehat commented Mar 24, 2016

necrolyte2 commented Mar 24, 2016

miseq report #92

miseq report #92

Comments

necrolyte2 commented Mar 2, 2016

Forward/Reverse read count that don't match well

Total Reads (F+R)

F Reads

F bases

R Reads

R Bases

necrolyte2 commented Mar 2, 2016

averagehat commented Mar 14, 2016

necrolyte2 commented Mar 23, 2016

averagehat commented Mar 23, 2016

necrolyte2 commented Mar 23, 2016

averagehat commented Mar 24, 2016

necrolyte2 commented Mar 24, 2016