-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there any hints to work with non-plant species? #91
Comments
Hi, yes, you don't need to tweak the code, but just provide a mosquito CDS file to the program (--cds) to filter protein-coding sequences in the TE annotation. Also, you may want to use --sensitive 1 to identify non-LTR retrotransposons (by RepeatModeler). Or if you have a manually curated set of TEs, please give it to the program via --curatedlib. The set does not have to be complete and comprehensive, but please make sure of the authenticity of the provided elements. There are many non-plant applications of this program as you may find them here #15 Best, |
My genome is novel, just assembled, and gene annotation is not available yet. So, I prefer to use $protlib with proteins from the related species. |
You may use the sister species' CDS to do the job. I don't recommend
changing $protlib because that only does low-level cleaning.
The RM2 paper shows that EDTA identified fewer sequences in Drosophila
while RM2 identified more, which doesn't necessarily say RM2 was more
sensitive. Image a program can identify 100% of the sequence, such a
program certainly did something wrong. To benefit from the extra
sensitivity RM2 may contribute, you can use the --sensitive 1 parameter
which recruits RM2 to do an extra round of searching.
beta2 is under-development, unmaintained, and not tested. Please don't use
it for now.
Shujun
…On Thu, Jun 18, 2020 at 10:52 AM Sergei Ryazansky ***@***.***> wrote:
My genome is novel, just assembled, and gene annotation is not available
yet. So, I prefer to use $protlib with proteins from the related species.
My question arose after the reading of RM2 paper
<https://www.pnas.org/content/117/17/9451>, in which they show that EDTA
outperforms RM2 in the term of sensitivity only for plants but not for
drosophila. So, I'm wondering what kind of settings may be tuned in EDTA to
increase its sensitivity.
Also, there are a lot of if ($beta2==1) subroutines inside the code that
adds some additional cleaning to the predicted sequences. Did you test this
functionality with the reference species?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#91 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NDSBKFXRJHUHC7LFYTRXIZ33ANCNFSM4OBLPQAA>
.
|
Thanks.
|
The final TElib could have some level of redundancy but the highly redundant part should have been removed. Some sequences may share quite a bit of similarity with others but didn't meet the clustering threshold and will be kept as two sequences. You may use other clustering methods to perform extra clusterings. |
Hi,
I'm trying to explore the TE content in mosquitos genome. As for as I understand the EDTA pipeline with developed to work with plant species and was inspired by this guide. I've found the undocumented option in the EDTA that can help to filter-out any protein-coding genes from the predicted TE families - $protlib (EDTA_raw.pl) referencing to the cleaned plant proteome. Obviously, I can change the link to the cleaned (w/o any traces of TE-derived proteins) mosquito-specific proteome. The question is are there any other tweaks that can help to work with other genomes rather than plants in the EDTA? I mean the discovery, filtering and cleaning options.
The text was updated successfully, but these errors were encountered: