-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assemble_refbased aligns 0 reads #509
Comments
Thanks for posting screenshots. If you can share a copy of the That said, if the fastq reads shown in the first screenshot are representative, the simulated basecall quality scores may be part of the story, depending on the aligner specified for the workflow and options passed to it; the Q scores shown are all quite low (ASCII-to-Q-score table). What's the average basecall score reported for the uBam if you grep for Tangentially related, but what tool was used to create the simulated reads? (mason2 is what Heng Li used to simulate short reads in the minimap2 paper, and what the minimap2 docs suggest for evaluating mapping quality). |
Hello, apologies for the late response, and thank you for the pointers. I intentionally simulated some low quality reads, but I am still having the issue with more normal qualities too. Here is another case with better quality reads: These also fail to align using These are the stderr and stdout for I also attached two fastq's which are a minimal example -- they are just the first reads from the files in the screenshots (they still map with short_5x_cov.simulated_215_read1.txt My fastq's are generated using a custom perl script, and run through the pipeline with default commands, i.e.:
The reference I am using is the Wuhan-Hu-1 genome for SARS-CoV-2: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta Thanks in advance for continuing to help with this issue! |
Hi again, I think I figured out the issue. Looking at the I ran Picard again but this time manually specified -V=Standard (for phred+33 scaling) and after that, the output looks correct. However, I don't see that as a parameter for the |
Since we shell out to picard, the quality encoding detection of is ultimately determined by htsjdk, As for where these reads are getting lost, I converted the fastqs you posted and the one read pair within to a bam file and ran it through This means they were lost upstream in the initial The In this case, the one with all reads does contain mapped reads, however the filtered bam does not. The filtered bam is what is passed downstream to tasks in the rest of the workflow. The filtering function ultimately filters based on a few criteria, depending on the arguments specified and whether the reads are single-end or paired-end. After the initial mapping of the two read mates you posted above, the mapped reads have SAM flag values of Our filtering excludes all reads not marked as being mapped in a proper pair to avoid including the split mappings, chimeric reads, etc. we've occasionally encountered during past projects working with data from various library prep protocols. As for why the example reads do not have the "proper pair" bit set after initial alignment, and what the requirements are for having a "read mapped in [a] proper pair", as samtools says, that bit indicates "each segment [is] properly aligned according to aligner," so it's up to each aligner to decide. The default aligner used in I did find a comment where Heng Li mentions "Minimap2 won't work well at 15% error rate on ~150bp reads. Probably stampy or bwa-mem will work better." Most users of these pipelines generally do not want to keep low-quality reads or marginal mappings, so I'd suggest that you consider modifying the workflow and task WDLs on your end to suit your needs. Two potential changes would be to modify In the |
Thank you so much for the detailed response! I was able to get it to work by changing the Picard options when calling |
Hello, I am having an issue where I convert paired fastq's to ubam, then run the assemble_refbased pipeline, but 0 reads align despite no errors coming up in either of the steps. These are fastq's simulated from SARS-CoV-2, but I have validated with an external tool showing no clear errors and they also align with bwa-mem (screenshots below of the fastq's, the validation, and the alignment stats)
Any idea why this is happening or where to look for more clues? Parameters I should adjust? Thanks!
The text was updated successfully, but these errors were encountered: