Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Open
Phytobacteriology-UPNA opened this issue Aug 5, 2024 · 3 comments

Comments

@Phytobacteriology-UPNA
Copy link

Phytobacteriology-UPNA commented Aug 5, 2024

First, thanks so much for this fantastic tool

I was analyzing a collection of plasmids from pseudomonads, and came across predictions of relaxases that were unexpected, because I got a prediction with lower numbers using MOBscan.

Example:

NC_019265: predicted conjugative, but appears non-mobilizable in vivo (https://doi.org/10.3389/fmicb.2022.1076710)
MOB-typer predicts a realxase by comparison with the type accession NC_019265_00015

After going round trying to find what was that, I found a post that described that these accession could be found in the database:
https://zenodo.org/record/3786915/files/data.tar.gz?download=1

I got NC_019265_00015 from the database of mob.proteins.faa and got the sequence from NC_019265.

It NC_019265_00015 a transposase (WP_095178853.1)

I looked at other sequences in the database, and found several that were truncated relaxases --> Would it be possible that those will result in the spurious identification of truncated relaxases when analyzing plasmids?

Some other proteins, were relaxases that MOBscan was not able to identify.
Thanks!

@Phytobacteriology-UPNA
Copy link
Author

Phytobacteriology-UPNA commented Aug 6, 2024

Other two putative transposases found in the relaxase DB; you can confirm by blast against IS-Finder database (https://www-is.biotoul.fr/blast.php?prog_blast=blastp):

NC_007507_00032|MOBP(ISXac3)
MCRVLRVNRSGYYAWLCSPNSERAKEDDRLLGLIKHHWLASGSVYGHRKITTDLRDLGERCSRHRVHRLMRTEGLRAQVGYGRKPRFHGGMQCKAAANLLDRQFDVTEPDTAWASDFTFIRTHEGWMYLAVVIDLFSRQVVGWAMRDRADTELVVQAVLSAVWRRKPNAGCLVHSDQGSVYTSDDWRSFLASHGLVCSMSRRGNCHDNAPVESFFGLLKRERIRRLTYPTKDAARAEVFDYIEMFYNPNRRHGSTGDLSPVEFERRYAQRGS

CP026563_00069|MOBP(ISPsy4)
MLTQEQSVEIKVLARQGHGIKFIARELGISRNTVRKYLRKARSLPSDKVRPARPCKIDPFKDYLHERIEAARPHWIPATVLLREITALGYSGGVSRLKAYIRPFKRKAEEPVVRFETLPGKQIQVDFTTIRRGRQPLKAFVATLGFSRASFVRFSEREDSEAWLTGLREAFAYFGGVPEQALFDNAGMNMVAAQSRRCQDPVYHFILSWRENELPTDAQIFECAEHCIRQLGMEGHQYVTAIHQDTDNTHCHVAVNRVNPITYKAAALWNDADTLQKSCRVLERKYGFIQDNGSWQWGVNDQLVRAPFRYGSAPQGTVPLQVYSNTESLYHYAVREVREKVSELIESRAITWRQIHLALHERGLGLREQGEGLVIYDFLRPEGPVVKASSVHPTLTKFRLEAHIGAFEGPPTFEHEEWSYGIFSSYQPAFELRDKDVRFDRRQARAEARLDLKMRYKRYREGWEKPDLHVKDRYQQVAARYQAMKADVKRSQHDPLLRKLLYRVAEFDRMKAMAELRIELRDERQALAEKGLLRPLAYRPWVEQQALRGDVAAVSQLRGFVYREKRKERTPNGGFDRVIQCGQADDSAVYHLRSYTSHLHRDGTVEYLRDGRVGVIDRGDFVQVKPGFNDDDDLDNYRLAANLVSTKSGDAVKIIGDDQFVDQVLDAGCGVNHRGSQYVFQVTDPEQLARYDVIERDHRQYYGYDEPSRPQSPVRHDPVDDAPDDGYQPPRPFGG

@kbessonov1984
Copy link
Collaborator

Thank you for pointing to a potential review of the mob.proteins.faa database and potential issue with the NC_019265_00015 entry. We will also take a look at MOBScan webapp (https://castillo.dicom.unican.es/mobscan_about/) and MOBfamDB https://castillo.dicom.unican.es/mobscan_about/MOBfamDB.gz

@Phytobacteriology-UPNA
Copy link
Author

Thanks for your reply. I have found a few other issues:

Relaxases database:
There are other sequences classified as relaxases, but which are not. Actually, any short sequence is suspicious. For instance, this entry:

FJ696405|Col(Ye4449)
TTAACGATCACGGTGCTGCTCCAGCAGTTCACCGAGATTGCGGTCGAGGCTGTTTAACACCGCCAGCAGGGACACACGCTCAGTGGGATTCAGTCCGTCGAGCTGGTTAAGACGGCGGGCGATCTGGTTGAGGTTGTTGCCTATCCCGCTGACCTGTCGCAGCAGTTCGGGCGCCACATCGGGCAGACGGCGCT

This entry is a partial sequence. Additionally, it does not correspond to a relaxase, but to a gene (generally called mobC, or relaxase accessory protein-RAP) that usually precedes the relaxase gene -meaning that is not detecting a relaxase, which might be truncated, for instance-.

Rep protein database
This database also contains spurious entries. Again, anything that is too short is, in my opinion, suspicious. For instance:

AJ851089|IncFII
CACACCATCCTGCACTTATGTTGCACAGAAGGAGTGAGCACAGAAAGAAGTCTTGAACTTTTCCGGTCATATAACTATACTCCCCGCATAGAGCAACAGCTTCTATGCAGTTTCTTGTTAGCCCCGGTAATCTTCTCTTAGTCGCCAAACCTGGTGAAGATTATCGGGGTTTTTGCTTTTCTGGCTCCTGTAGATCCACATCAGAACCAGTTCCCTGCCACCTTACGGCGTGGCCAGCCACAAAATTCCTTAAACGATCAG

Corresponds to a putative regulatory protein, which are common preceding RepA, but not to the actual RepA, which is:

https://www.ncbi.nlm.nih.gov/nuccore/AJ851089.1?from=21947&to=22804&report=gbwithparts
https://www.ncbi.nlm.nih.gov/protein/62550802

Similar thing with these:

000124__KP125893_00142|IncFII
000129__CP018340|IncFII
000124__KP125893_00142|IncFII

If you would not mind, I would also like to offer some suggestions that, I think, could make the program easier to use and more useful:

  • Use a database of Rep proteins instead of rep genes (or at least, let the user use one or the other), in order to detect remote homologs and limit the detection of truncated sequences as if they were full-length.

  • Define the Rep groups based on homology. Right now, some groups contain sequences that do not show significant homology. For instance, the translated sequence of 000188__NC_013973_00002|IncP does not show significant homology with the products of 000177__AB237782_00014|IncP, 000180__NC_006830_00001|IncP or 000182__CP002151_00001|IncP, despite all being classified as IncP. The same happens with certain IncFII entries.

  • Change the heading of the entries in the database. I think it would make it much easier and useful for the user if the rep_type_accession (and relaxase_type_accession) would provide protein accession numbers instead of the current codes.

  • Provide some (simple?) instructions to indicate what parameters can be changed and how to customize the search.

  • Remove partial sequences. These easily identify truncated genes and reports them as if they were full-length.

Again, thanks very much for this nice tool, and I hope that these comments are of use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants