Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero length reads #106

Merged

Conversation

christopher-schroeder
Copy link
Contributor

Allow reads in bam that have been trimmed to zero length.

@hdashnow
Copy link
Collaborator

hdashnow commented May 6, 2022

Thanks, @christopher-schroeder!

I wonder if it would be simpler to exclude these reads completely rather than allowing them through? @brentp what do you think?

@christopher-schroeder
Copy link
Contributor Author

Do you mean removing them from the bam? Thats not that easy, because to get a valid bam, you would have to remove or modify the mate. And I have about 300 whole genomes already processed in bam. strling needs indexed data, so you cannot stream. That would mean writing a lot of terrabytes ohne for a couple of removed reads. Also I think a tool should be able to process input files as long as they are valid by format specification.

Or do you mean ignoring them in strling? I am not so deep into the source code and don't know what happens if you get see read, where the mate has been ignored previously. But if this not a problem, then ignoring the read would be totally fine!

@hdashnow
Copy link
Collaborator

Sorry, came to check on another PR and realized we left this one hanging! I'm thinking to remove the assert statement, and instead skipping over these reads as they are not informative.

@brentp
Copy link
Member

brentp commented Oct 25, 2022

yes, I think we can skip them, but we must make sure that the mate is added/removed from the cache or the memory might grow quickly.

@hdashnow hdashnow merged commit c40f66c into quinlan-lab:master Mar 3, 2023
@hdashnow
Copy link
Collaborator

hdashnow commented Mar 3, 2023

I'm going to allow 0-len alignments, but report on them in debug mode. I don't have a good data set to test this on, but if this comes up again, at least we can count the occurrence in the debug output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants