-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lowercase reference sequence does not translate correctly in 'Get Sequence' output #1944
Comments
@deepakunni3 I'm wondering if it is related to this: #1879 |
@monicacecilia I'm having trouble replicating this. 1) what version of Apollo are you running and 2) is there a way I could get your reference sequence and GFF3 for the data in question? (or instead, are you able to reproduce on the Honeybee instance)? Thanks. |
Err... 😳 @nathandunn - it's not me, it's the other Monica. M'dear @mpoelchau, it seems we will continue to be one and the same forever and ever, and ever, and ever... fine by me! I miss you! 😍 p.s: since I'm here... I recall that lower case residues appeared in nucleotide sequences when the region of the read was of low quality. Apollo was (must have been!) directed to not export these regions of low quality, instead treating the low quality of the reads as gaps in the sequence; hence the 'X' letters in place of amino acid residues. I would look into the source of the reference sequence (or its metadata) to find out why the sequence included low case nucleotide residues. Sequences with the "NW_" prefix come from NCBI assemblies, right? So perhaps this has something to do with how the sequence itself was annotated there. At any rate. Good luck! |
Thanks for chiming in, @monicacecilia! We haven't had 'the other Monica' problem in a while, eh! Hope life is treating you well. As far as the lowercase fasta - this usually represents soft-masking of the reference sequence due to repetitive elements. Since the sequences in our case come from NCBI, it's safe to assume that the lowercase is due to repeatmasking via WindowMasker (cf. https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#masking) - although I recognize that Apollo serves more use cases than our own :). While repeatmasking may indicate that the sequence region is non-coding, some of our annotators have found protein-coding genes within sequence that had been erroneously masked out. Therefore, I don't think it would be useful to the annotator to have Apollo not translate these regions properly in the 'Select Sequence' view - particularly while still translating them properly in the reference sequence view. It's good to know the initial rationale, though, and where to start looking for the fix! And yeah, our homeobox specialist found this bug :). FYI, I sent @nathandunn some example files via email, and he's issued a PR (#1951) that we still need to test on our end. |
oh shoot sorry @monicacecilia Hope all is going well. |
@nathandunn This PR works for us! #1951 |
fixed by #1951 |
Search mailing list or email list
apollo@lists.lbl.gov
if a general setup question.Provide what you were doing and what you expected to see. Screenshots, directory, config files are a plus if relevant.
Provide the javascript console log output generated from the action.
Provide the server log output generated from the action (typically
catalina.out
).Thanks for looking into this!
The text was updated successfully, but these errors were encountered: