Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting splitters BAM from long reads data? #42

Closed
cmdcolin opened this issue Nov 14, 2019 · 7 comments
Closed

Getting splitters BAM from long reads data? #42

cmdcolin opened this issue Nov 14, 2019 · 7 comments

Comments

@cmdcolin
Copy link

Hi there,
I was curious if this tool works for getting splitters for a long read BAM file. I am currently running the steps to try it out but I was wondering if it's something I could recommend to other people (I have a visualization tool that would be ideal if it just got a BAM file with the splitters with everything else filtered out)

@GregoryFaust
Copy link
Owner

Yes, samblaster will work to output splitters from single-end reads if you use the --ignoreUnmated option. You may also want to read #37

@cmdcolin
Copy link
Author

Super thank you! I figured it'd do the trick

@cmdcolin
Copy link
Author

If you get a chance maybe add a note in the readme 👍 I'll close for now

@GregoryFaust
Copy link
Owner

Release 0.1.25 includes sample scenarios in both the README.md and in the program help text.

@cmdcolin
Copy link
Author

cmdcolin commented Mar 17, 2020

This is a somewhat weird postmortem, but I found after asking this question that my BAM parser I made wasn't parsing the SA tag and I so I was operating on an assumptionthat there were split reads that lacked SA tag. Since my parser was bad though, it seems generally there will be an SA tag. Would it be fair to say that I could probably rely on the SA tag in most cases and then I could filter splitters from a coordinate sorted BAM by just grepping for the SA tag?

@GregoryFaust
Copy link
Owner

In our experience, you rarely want to look at all chimeric alignments. That is why samblaster has no fewer than 4 parameters that control which split reads that are output in the splitter file: --maxSplitCount, --maxUnmappedBases, --minIndelSize, and --minNonOverlap. These parameters and their default values were carefully selected to report likely split reads relevant for use in detecting structural variants without a lot of false positives or false negatives. We developed these ideas in Ira Hall's Lab at UVA (now at Wash. U. St. Louis) while doing research that led to several tools/pipelines for SV detection such as SpeedSeq, Lumpy, Hydra, YAHA, SVsim and others.

@cmdcolin
Copy link
Author

Thank you for the detailed response. This is quite helpful. My angle is that I am developing tools to help visualize split/paired reads for structural variation, and I will definitely look into these tools as sources of the data (already have used lumpy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants