-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for running everything in PE mode without merging #64
Comments
Keep in mind that if you want to implement to run datasets without merging, i.e. keeping PE reads as separate mates, you would need to re-implement the extracting the (un-)mapped reads, too. Currently, it is implemented in a way that the BWA output is filtered for the SAM flag "is_unmapped". However, it happens regularly that BWA ends up aligning one mate of a pair but cannot align the other one. Simply filtering for whether a read is unmapped would rip apart two mates in this example. To avoid that you would need to filter differently to extract mapped reads, e.g. for being either properly paired for paired reads and not paired and not unmapped for single reads. This kind of edge case doesn't happen to often, but still happens regularly enough (approx. 0.01% of the reads at least in my recent experiment that made me aware of this issue). I personally do the more complex filtering with the powerful tool bam-mangle (https://bitbucket.org/ustenzel/biohazard-tools), which allows you filter your reads in a DSL way using the concatenation of boolean expressions, but it is written in Haskell and might be a pain to get it into a bioconda recipe. |
Thanks @alexhbnr for bringing this up! I think we might not even need to do this, as the use case is quite rare in general or not? |
The case what @alexhbnr was referring to is when someone is running modern
data long-molecule but short sequence data. This can happen quite often
when trying to run modern reference data at the same time (e.g. with UDG
data).
…On Thu, 14 Feb 2019 at 14:42, Alexander Peltzer ***@***.***> wrote:
Thanks @alexhbnr <https://github.com/alexhbnr> for bringing this up! I
think we might not even need to do this, as the use case is quite rare in
general or not?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#64 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARHmT9tiRPiNmHjytCOgrVXTtN1s0JK2ks5vNWfEgaJpZM4XjS-n>
.
|
i.e. it's not a priority but a minor feature request for downstream.
… |
Fixed in #159 |
We should be able to run datasets without merging reads and just clipping etc. pp.
The text was updated successfully, but these errors were encountered: