Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

Closed
mpoelchau opened this issue Oct 4, 2018 · 7 comments
Milestone

Comments

@mpoelchau
Copy link

  • Search mailing list or email list apollo@lists.lbl.gov if a general setup question.

    • haven't seen this in the mailing list.
  • Provide what you were doing and what you expected to see. Screenshots, directory, config files are a plus if relevant.

    • What i'm doing: getting the protein sequence from a user-created annotation using the 'Get Sequence' function
    • What I expected to see: the same translation that is visible in the 'Reference Sequence' track
    • What I see, instead: a different translation. Specifically, X's. See the blue outlines in the screenshot below. (model is on - strand)
    • What I suspect: Apollo2 doesn't like lowercase reference sequence. (See also Fix splice site detection #1881)

protein-sequence

  • Provide the javascript console log output generated from the action.

    • no action, so no relevant console log output for it
  • Provide the server log output generated from the action (typically catalina.out).

    • no action, so no relevant server log output

Thanks for looking into this!

@nathandunn nathandunn added this to the 2.1.1 milestone Oct 4, 2018
@nathandunn
Copy link
Contributor

@deepakunni3 I'm wondering if it is related to this: #1879

@nathandunn
Copy link
Contributor

@monicacecilia I'm having trouble replicating this. 1) what version of Apollo are you running and 2) is there a way I could get your reference sequence and GFF3 for the data in question? (or instead, are you able to reproduce on the Honeybee instance)?

Thanks.

@monicacecilia
Copy link
Member

Err... 😳 @nathandunn - it's not me, it's the other Monica.

M'dear @mpoelchau, it seems we will continue to be one and the same forever and ever, and ever, and ever... fine by me! :bowtie: I miss you! 😍

p.s: since I'm here... I recall that lower case residues appeared in nucleotide sequences when the region of the read was of low quality. Apollo was (must have been!) directed to not export these regions of low quality, instead treating the low quality of the reads as gaps in the sequence; hence the 'X' letters in place of amino acid residues. I would look into the source of the reference sequence (or its metadata) to find out why the sequence included low case nucleotide residues. Sequences with the "NW_" prefix come from NCBI assemblies, right? So perhaps this has something to do with how the sequence itself was annotated there. At any rate. Good luck!
p.s2: @mpoelchau, yay, fun with homeobox-containing proteins! 😉

@mpoelchau
Copy link
Author

Thanks for chiming in, @monicacecilia! We haven't had 'the other Monica' problem in a while, eh! Hope life is treating you well.

As far as the lowercase fasta - this usually represents soft-masking of the reference sequence due to repetitive elements. Since the sequences in our case come from NCBI, it's safe to assume that the lowercase is due to repeatmasking via WindowMasker (cf. https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#masking) - although I recognize that Apollo serves more use cases than our own :). While repeatmasking may indicate that the sequence region is non-coding, some of our annotators have found protein-coding genes within sequence that had been erroneously masked out. Therefore, I don't think it would be useful to the annotator to have Apollo not translate these regions properly in the 'Select Sequence' view - particularly while still translating them properly in the reference sequence view.

It's good to know the initial rationale, though, and where to start looking for the fix! And yeah, our homeobox specialist found this bug :).

FYI, I sent @nathandunn some example files via email, and he's issued a PR (#1951) that we still need to test on our end.

@nathandunn
Copy link
Contributor

oh shoot sorry @monicacecilia Hope all is going well.

@mpoelchau
Copy link
Author

@nathandunn This PR works for us! #1951

@nathandunn
Copy link
Contributor

fixed by #1951

@nathandunn nathandunn modified the milestones: 2.1.1, 2.2.0 Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants