-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.1.5 kicking out one of pair of discordant reads ; v1.1.1 not showing same behaviour #664
Comments
I can see you have now have 1.1.6 tagged release, which I'll now use/test to avoid any additional variables - can see some documentation clarification regarding unpaired/unaligned reads .. no obvious code/functionality change relating to this issue however |
The only conceivable relevant change I can see between 1.1.1 and 1.1.6 is that we changed the pairwriter to hold two separate file handles to the BAM file open, rather than multiple iterators referencing the same BAM file. See PR #543. I really don't think this should be an issue though. Can you check the log file for two things:
The other possibility, I guess, is that pysam has changed something in the way it handles BAM files. Can you report the pysam versions in each case? |
Hiya. Just ran comparison on much smaller dataset(is what I have at hand locally) and can see v1.1,6 is 'not finding' some mates where as v1.1.1 is not having same issue: UMI-tools version: 1.1.6:
UMI-tools version: 1.1.1
Note, I have run your updated code for pytest on the 1.1.6 version installation above which completes with no errors! |
I don't suppose there is any way you could either:
OR
|
However, the fact that the final search for mates in 1.1.6 takes less than a millisecond doesn't fill me with hope. |
Yeh managed to install 1.1.1 with pysam 0.22.1 (other way around not quickly working for me...compatibility errors...):
May be able to sent some control data . May need to wait until next week now. |
Are you comfortable installing a custom version? If so could you pull the branch at: install into the relevant environment (python setup.py install) and try that? I've reverted the only line that I think could possibly be connected. Its not a long term fix, as this will slow umi_tools down several hundred fold in some situations, but at least we might be able to pin point the problem. |
Hi Ian, sorry for late response. Yes I can have a look at as soon as I can. |
Bssed on UMI-tools out logs only (i've not had time to look at the actual reads in question), this seems to have fixed the issue ie under the branch above (
|
Okay, how about the lastest commit on that branch? |
Describe the bug
Following upgrade of UMI-tools to v1.1.5 (still testing under commit bcce5e6 for python 1.12 compatibility) in our fusion calling pipeline, I have seen specific reads being kicked out of the dedup process where they weren't under version 1.1.1. These reads are discordant /chimeric type reads where each of a pair is aligned to a different chromosome. As this is a fusion calling pipeline, such reads are important. Note also, I remove any unaligned/improper pairs prior to the dedup process, as I know umi-tools doesn't handle these well/in way I want it to.
To Reproduce
Exampled below are a pair of reads in a cleaned bam file going into dedup process and grep of same read ID following dedup process (dedup.bam)
Umi-tools options run here are the same for both version/examples seen above including:
Expected behaviour
Output both reads of the pair. I can see from downstream pipeline stage of converting bam to fastq, there are now significant numbers of single reads (as per samtools fastq -s option) following v1.1.5 dedup - not see before under v1.1.1. These will include such singletons as seen above
Environment
download/clone then python3 -m pip install -r requirements.txt && python3 setup.py install
Additional context
Add any other context about the problem here. Was UMI-tools being run within another pipeline, etc.
The text was updated successfully, but these errors were encountered: