-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data for "Per base sequence quality" section differ from FastQC #25
Comments
Now I made small fastq to demonstrate per base quality difference. Sample: test_quality.fastq.gz Falco result: Fastqc result: See >>Per base sequence quality section, 10-14 group. falco: I see Median, Lower, Quartile, Upper Quartile, 10th Percentile, 90th Percentile are always ints - not doubles. |
Hello, Thank you so much for looking into the differences and my sincere apologies for not reaching out sooner! We can merge your fix to the main branch if you submit a pull request. The only thing I notice is you are using C notation to cast as double, whereas we use In other words the line
would be written as
I can also fix this myself if you submit the PR as-is. Thank you again for looking into the issue! |
… to project style
Hello, @guilhermesena1. I fixed cast. Thank you! |
Hi, thank you for your effort to fixed this issue @Shelestova-Anastasia @guilhermesena1. I am using Nanopore Sequencing data, and getting the same error in Falco v1.2.1 the per base sequence quality section. |
Hello, Thanks for reporting the different output. I need to look a little further into FastQC's code, but intuitively it doesn't make sense to me that any of the quantiles would be NaNs. If there is at least one read of a given size (or within a size range), then the mean, median, and all quantiles would be well-defined. For example, if there is a single read quality, then all values would be identical. I don't see a case where "NaN" would be the desired outcome. I'll look into it though. |
Yeah it looks like FastQC only considers a position to be "valid" if they have at least 100 reads covering that position. I guess we'll have to emulate that functionality as well? This is the
|
Thank you for the quick response. |
It may be a bug, but this is also the correct output if there is only one read within a certain group (very possible in ONT data with high variability of read lengths). If there is only one read of size 50,000, then all quantiles of quality would be the quality value of the 50,000th base of that read, right? |
Yeah it should be, that's correct you are right. Thanks for the response, and such a great tool |
falco version 0.3.0
Fastqc and falco results are differrent for section "Per base sequence quality"
Per base sequence quality seqtion to compare in results:
pbsq_falco.txt
pbsq_fastqc.txt
Could not attach sample (25.3 Mb). Let me know if I can provide more information.
The text was updated successfully, but these errors were encountered: