Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Phytobacteriology-UPNA · 2024-08-05T17:59:15Z

First, thanks so much for this fantastic tool

I was analyzing a collection of plasmids from pseudomonads, and came across predictions of relaxases that were unexpected, because I got a prediction with lower numbers using MOBscan.

Example:

NC_019265: predicted conjugative, but appears non-mobilizable in vivo (https://doi.org/10.3389/fmicb.2022.1076710)
MOB-typer predicts a realxase by comparison with the type accession NC_019265_00015

After going round trying to find what was that, I found a post that described that these accession could be found in the database:
https://zenodo.org/record/3786915/files/data.tar.gz?download=1

I got NC_019265_00015 from the database of mob.proteins.faa and got the sequence from NC_019265.

It NC_019265_00015 a transposase (WP_095178853.1)

I looked at other sequences in the database, and found several that were truncated relaxases --> Would it be possible that those will result in the spurious identification of truncated relaxases when analyzing plasmids?

Some other proteins, were relaxases that MOBscan was not able to identify.
Thanks!

Phytobacteriology-UPNA · 2024-08-06T08:31:28Z

Other two putative transposases found in the relaxase DB; you can confirm by blast against IS-Finder database (https://www-is.biotoul.fr/blast.php?prog_blast=blastp):

NC_007507_00032|MOBP(ISXac3)
MCRVLRVNRSGYYAWLCSPNSERAKEDDRLLGLIKHHWLASGSVYGHRKITTDLRDLGERCSRHRVHRLMRTEGLRAQVGYGRKPRFHGGMQCKAAANLLDRQFDVTEPDTAWASDFTFIRTHEGWMYLAVVIDLFSRQVVGWAMRDRADTELVVQAVLSAVWRRKPNAGCLVHSDQGSVYTSDDWRSFLASHGLVCSMSRRGNCHDNAPVESFFGLLKRERIRRLTYPTKDAARAEVFDYIEMFYNPNRRHGSTGDLSPVEFERRYAQRGS

CP026563_00069|MOBP(ISPsy4)
MLTQEQSVEIKVLARQGHGIKFIARELGISRNTVRKYLRKARSLPSDKVRPARPCKIDPFKDYLHERIEAARPHWIPATVLLREITALGYSGGVSRLKAYIRPFKRKAEEPVVRFETLPGKQIQVDFTTIRRGRQPLKAFVATLGFSRASFVRFSEREDSEAWLTGLREAFAYFGGVPEQALFDNAGMNMVAAQSRRCQDPVYHFILSWRENELPTDAQIFECAEHCIRQLGMEGHQYVTAIHQDTDNTHCHVAVNRVNPITYKAAALWNDADTLQKSCRVLERKYGFIQDNGSWQWGVNDQLVRAPFRYGSAPQGTVPLQVYSNTESLYHYAVREVREKVSELIESRAITWRQIHLALHERGLGLREQGEGLVIYDFLRPEGPVVKASSVHPTLTKFRLEAHIGAFEGPPTFEHEEWSYGIFSSYQPAFELRDKDVRFDRRQARAEARLDLKMRYKRYREGWEKPDLHVKDRYQQVAARYQAMKADVKRSQHDPLLRKLLYRVAEFDRMKAMAELRIELRDERQALAEKGLLRPLAYRPWVEQQALRGDVAAVSQLRGFVYREKRKERTPNGGFDRVIQCGQADDSAVYHLRSYTSHLHRDGTVEYLRDGRVGVIDRGDFVQVKPGFNDDDDLDNYRLAANLVSTKSGDAVKIIGDDQFVDQVLDAGCGVNHRGSQYVFQVTDPEQLARYDVIERDHRQYYGYDEPSRPQSPVRHDPVDDAPDDGYQPPRPFGG

kbessonov1984 · 2024-10-09T18:19:19Z

Thank you for pointing to a potential review of the mob.proteins.faa database and potential issue with the NC_019265_00015 entry. We will also take a look at MOBScan webapp (https://castillo.dicom.unican.es/mobscan_about/) and MOBfamDB https://castillo.dicom.unican.es/mobscan_about/MOBfamDB.gz

Phytobacteriology-UPNA · 2024-10-17T08:51:13Z

Thanks for your reply. I have found a few other issues:

Relaxases database:
There are other sequences classified as relaxases, but which are not. Actually, any short sequence is suspicious. For instance, this entry:

FJ696405|Col(Ye4449)
TTAACGATCACGGTGCTGCTCCAGCAGTTCACCGAGATTGCGGTCGAGGCTGTTTAACACCGCCAGCAGGGACACACGCTCAGTGGGATTCAGTCCGTCGAGCTGGTTAAGACGGCGGGCGATCTGGTTGAGGTTGTTGCCTATCCCGCTGACCTGTCGCAGCAGTTCGGGCGCCACATCGGGCAGACGGCGCT

This entry is a partial sequence. Additionally, it does not correspond to a relaxase, but to a gene (generally called mobC, or relaxase accessory protein-RAP) that usually precedes the relaxase gene -meaning that is not detecting a relaxase, which might be truncated, for instance-.

Rep protein database
This database also contains spurious entries. Again, anything that is too short is, in my opinion, suspicious. For instance:

AJ851089|IncFII
CACACCATCCTGCACTTATGTTGCACAGAAGGAGTGAGCACAGAAAGAAGTCTTGAACTTTTCCGGTCATATAACTATACTCCCCGCATAGAGCAACAGCTTCTATGCAGTTTCTTGTTAGCCCCGGTAATCTTCTCTTAGTCGCCAAACCTGGTGAAGATTATCGGGGTTTTTGCTTTTCTGGCTCCTGTAGATCCACATCAGAACCAGTTCCCTGCCACCTTACGGCGTGGCCAGCCACAAAATTCCTTAAACGATCAG

Corresponds to a putative regulatory protein, which are common preceding RepA, but not to the actual RepA, which is:

https://www.ncbi.nlm.nih.gov/nuccore/AJ851089.1?from=21947&to=22804&report=gbwithparts
https://www.ncbi.nlm.nih.gov/protein/62550802

Similar thing with these:

000124__KP125893_00142|IncFII
000129__CP018340|IncFII
000124__KP125893_00142|IncFII

If you would not mind, I would also like to offer some suggestions that, I think, could make the program easier to use and more useful:

Use a database of Rep proteins instead of rep genes (or at least, let the user use one or the other), in order to detect remote homologs and limit the detection of truncated sequences as if they were full-length.
Define the Rep groups based on homology. Right now, some groups contain sequences that do not show significant homology. For instance, the translated sequence of 000188__NC_013973_00002|IncP does not show significant homology with the products of 000177__AB237782_00014|IncP, 000180__NC_006830_00001|IncP or 000182__CP002151_00001|IncP, despite all being classified as IncP. The same happens with certain IncFII entries.
Change the heading of the entries in the database. I think it would make it much easier and useful for the user if the rep_type_accession (and relaxase_type_accession) would provide protein accession numbers instead of the current codes.
Provide some (simple?) instructions to indicate what parameters can be changed and how to customize the search.
Remove partial sequences. These easily identify truncated genes and reports them as if they were full-length.

Again, thanks very much for this nice tool, and I hope that these comments are of use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Phytobacteriology-UPNA commented Aug 5, 2024 •

edited

Loading

Phytobacteriology-UPNA commented Aug 6, 2024 •

edited

Loading

kbessonov1984 commented Oct 9, 2024

Phytobacteriology-UPNA commented Oct 17, 2024

Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Relaxase database of MOB-typer: several truncated proteins and a transposase #170

Comments

Phytobacteriology-UPNA commented Aug 5, 2024 • edited Loading

Phytobacteriology-UPNA commented Aug 6, 2024 • edited Loading

kbessonov1984 commented Oct 9, 2024

Phytobacteriology-UPNA commented Oct 17, 2024

Phytobacteriology-UPNA commented Aug 5, 2024 •

edited

Loading

Phytobacteriology-UPNA commented Aug 6, 2024 •

edited

Loading