lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

mpoelchau · 2018-10-04T15:21:09Z

Search mailing list or email list apollo@lists.lbl.gov if a general setup question.
- haven't seen this in the mailing list.
Provide what you were doing and what you expected to see. Screenshots, directory, config files are a plus if relevant.
- What i'm doing: getting the protein sequence from a user-created annotation using the 'Get Sequence' function
- What I expected to see: the same translation that is visible in the 'Reference Sequence' track
- What I see, instead: a different translation. Specifically, X's. See the blue outlines in the screenshot below. (model is on - strand)
- What I suspect: Apollo2 doesn't like lowercase reference sequence. (See also Fix splice site detection #1881)

Provide the javascript console log output generated from the action.
- no action, so no relevant console log output for it
Provide the server log output generated from the action (typically catalina.out).
- no action, so no relevant server log output

Thanks for looking into this!

The text was updated successfully, but these errors were encountered:

nathandunn · 2018-10-04T17:02:45Z

@deepakunni3 I'm wondering if it is related to this: #1879

nathandunn · 2018-10-19T23:14:50Z

@monicacecilia I'm having trouble replicating this. 1) what version of Apollo are you running and 2) is there a way I could get your reference sequence and GFF3 for the data in question? (or instead, are you able to reproduce on the Honeybee instance)?

Thanks.

monicacecilia · 2018-10-23T08:00:48Z

Err... 😳 @nathandunn - it's not me, it's the other Monica.

M'dear @mpoelchau, it seems we will continue to be one and the same forever and ever, and ever, and ever... fine by me! I miss you! 😍

p.s: since I'm here... I recall that lower case residues appeared in nucleotide sequences when the region of the read was of low quality. Apollo was (must have been!) directed to not export these regions of low quality, instead treating the low quality of the reads as gaps in the sequence; hence the 'X' letters in place of amino acid residues. I would look into the source of the reference sequence (or its metadata) to find out why the sequence included low case nucleotide residues. Sequences with the "NW_" prefix come from NCBI assemblies, right? So perhaps this has something to do with how the sequence itself was annotated there. At any rate. Good luck!
p.s2: @mpoelchau, yay, fun with homeobox-containing proteins! 😉

mpoelchau · 2018-10-23T13:04:58Z

Thanks for chiming in, @monicacecilia! We haven't had 'the other Monica' problem in a while, eh! Hope life is treating you well.

As far as the lowercase fasta - this usually represents soft-masking of the reference sequence due to repetitive elements. Since the sequences in our case come from NCBI, it's safe to assume that the lowercase is due to repeatmasking via WindowMasker (cf. https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#masking) - although I recognize that Apollo serves more use cases than our own :). While repeatmasking may indicate that the sequence region is non-coding, some of our annotators have found protein-coding genes within sequence that had been erroneously masked out. Therefore, I don't think it would be useful to the annotator to have Apollo not translate these regions properly in the 'Select Sequence' view - particularly while still translating them properly in the reference sequence view.

It's good to know the initial rationale, though, and where to start looking for the fix! And yeah, our homeobox specialist found this bug :).

FYI, I sent @nathandunn some example files via email, and he's issued a PR (#1951) that we still need to test on our end.

nathandunn · 2018-10-23T14:23:10Z

oh shoot sorry @monicacecilia Hope all is going well.

mpoelchau · 2018-10-25T14:09:28Z

@nathandunn This PR works for us! #1951

nathandunn · 2018-10-25T14:19:44Z

fixed by #1951

nathandunn added this to the 2.1.1 milestone Oct 4, 2018

nathandunn mentioned this issue Oct 23, 2018

fixed masking problem #1951

Merged

nathandunn closed this as completed Oct 25, 2018

nathandunn modified the milestones: 2.1.1, 2.2.0 Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

mpoelchau commented Oct 4, 2018

nathandunn commented Oct 4, 2018

nathandunn commented Oct 19, 2018

monicacecilia commented Oct 23, 2018

mpoelchau commented Oct 23, 2018

nathandunn commented Oct 23, 2018

mpoelchau commented Oct 25, 2018

nathandunn commented Oct 25, 2018

lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944

Comments

mpoelchau commented Oct 4, 2018

nathandunn commented Oct 4, 2018

nathandunn commented Oct 19, 2018

monicacecilia commented Oct 23, 2018

mpoelchau commented Oct 23, 2018

nathandunn commented Oct 23, 2018

mpoelchau commented Oct 25, 2018

nathandunn commented Oct 25, 2018