Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epiread.c #28

Closed
vpbrendel opened this issue Jul 26, 2022 · 2 comments
Closed

epiread.c #28

vpbrendel opened this issue Jul 26, 2022 · 2 comments
Assignees

Comments

@vpbrendel
Copy link

line 706
// reference is a G
if (bsstrand && rb == 'G' && rpos+j-1 >= rs->beg) {

I think you need also "&& j>0" to prevent a core dump for some input.

Volker

@jamorrison
Copy link

I looked through the code block associated with that statement, and I think the code is okay as written. rpos is 1-based, so it's smallest value would be 1, and the rpos+j-1 >= rs->beg check will make sure there isn't a core dump from fetching a base outside of the cached reference window.

However, if you find a case where it actually does produce a core dump, feel free to reopen this issue with the test case and I'll look into it more then.

jamorrison added a commit that referenced this issue Aug 17, 2022
@jamorrison
Copy link

I looked into this further and found there is a specific case where the core dump occurs. When the read is from the OB/CTOB strand and the first base is a G in a CG, then as the code was written, it would produce a core dump. The implemented fix isn't ideal (see below for a copy of the in-code comment), but none of the fixes I thought through were ideal.

/* This is a tough case that can only happen when calling '-5 0' in
 * the invocation. But it's also not a rare edge case.
 *
 * When the first base of a OB/CTOB read is a G in a CG, then you
 * can get the methylation status from that base. However, all CGs
 * have the methylation status placed on the C for consistency
 * across strands and ease for downstream analysis. Because the C
 * doesn't occur in the read itself, this can cause issues for how
 * to handle this methylation status. You can't put it on the C,
 * because it's not there, and you can't put it on the G, since that
 * doesn't fit how all the other reads are handled.
 *
 * For now, the least worst option (or the one I'm going with) is to
 * filter this base when it occurs. The occurrences will be counted
 * and printed out at the end so that the user knows it occurs. If
 * running in verbose mode, then a message will be printed to the
 * screen as well.
 *
 * This is an issue for both NOMe- and BS-seq.
 */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants