`--proteins` external database doesn't give excpected assignements #719

fconstancias · 2025-02-05T13:02:28Z

Thanks for developing all these really helpful tools for the microbial ecology community!

I would like to annotate Streptococcus pneumoniae genomes for genes involved in bacteriocin (antimicrobial/immunity/regulation). I have download the genes I am interested in in amino acid sequences from UNIPROT and ideally I would like to include annotation from specific database to my gbff or gff files I have downloaded from NCBI.

I am exploring the --proteins flag from prokka to see if it can help me to achieve this objective

I formatted my database according to the instructions:

>A0A384ZZZ3 ~~~sliC~~~~~~
MDENKVIIDLSEKVFAKFDEQLKRYAEQPNYDLLTLSSGLPGLILLSSELTSLTSERKYS
ARTGKYVNFMVKQMRNYGVLSDSLFSGVSGIGISILHLVEEHPEYHNLLISFNEYIKYYT
LSKIENIDIKKISPTDYDIIEGVSGVLVYLLSQEQDENDYIINRIINFLSEFSLKNSTLT
GFYVESKNQMSKTESKLYPLGCLNFGLAHGLAGVGAMLSYSKLKGYSNEKSIAAIKKIIM
LYEKHELKNYMWKEGLSDIELKKTEKSNLQYEFIRDAWCYGSPGISLLYLYSSLALEDKK
LKSKACNILKASIRRSNGLEQSILCHGFSGAIEICLFFKKIYKTTDFDDCIKSLKEKLIS
DFREDMTYGFNTTAEFENIKTKDNLGYLDGIIGILLTMIELNNLKVTTNWQRALLLFDDV
IKEVK
>A0A0H2UNX0 ~~~blpB~~~~~~
MNPNLFRSVEFYQRRYHNYATVLIIPLSLLFTFILIFSLVATKEITVTSQGEIAPTSVIA
SIQSTSDNPILANHLVANQVVEKGDLLIKYSETMEESQKTALATQLQRLEKQKEGLGILK
QSLEKATDLFSGEDEFGYHNTFMNFTKQSHDIELGITKTNTEVSNQANLSNSSSSAIEQE
ITKVQQQIGEYQELRDAIINNRARLPTGNPHQSILNRYLVASQGQTQGTAEEPFLSQINQ
SIAGLESSIASLKIQQAGIGSVATYDNSLATKIEVLRTQFLQTASQQQLTVENQLTELKV
QLDQATQRLENNTLTSPSKGIVHLNSEFEGKNRIPTGTEIAQIFPVITDTREVLITYYVS
SDYLPLLDKGQTVRLKLEKIGNHGTTIIGQLQTIDQTPTRTEQGNLFKLTALAKLSNEDS
KLIQYGLQGRVTSVTTKKTYFDYFKDKILTHSD

and I am surprised to see that only 1 gene was annotated using this custom database:

[13:45:55] Running: prodigal -i prokka\/PROKKA_02052025\.fna -c -m -g 11 -p single -f sco -q
[13:45:56] Found 1984 CDS
[13:45:56] Connecting features back to sequences
[13:45:56] Not using genus-specific database. Try --usegenus to enable it.
[13:45:56] Preparing user-supplied primary BLAST annotation source: protein_sequences_prokka_ready.faa
[13:45:56] Guessed source was in fasta format.
[13:45:56] Running: makeblastdb -dbtype prot -in protein_sequences_prokka_ready\.faa -out prokka\/proteins -logfile /dev/null
[13:45:56] Using /inference source as 'protein_sequences_prokka_ready.faa'
[13:45:56] Annotating CDS, please be patient.
[13:45:56] Will use 3 CPUs for similarity searching.
[13:45:57] There are still 1984 unannotated CDS left (started with 1984)

example from the proteins.tmp.blast file

Query= 259

Length=61
Score E
Sequences producing significant alignments: (Bits) Value

A0A062WQJ3 ~~~cibA 118 2e-39

A0A062WQJ3 ~~~cibA~~~~~~
Length=61

Score = 118 bits (295), Expect = 2e-39, Method: Compositional matrix adjust.
Identities = 61/61 (100%), Positives = 61/61 (100%), Gaps = 0/61 (0%)

Query 1 MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG 60
MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG
Sbjct 1 MTNFDILDNQFLSLSENELSDIDGGLAPLVIFGVAVSWKAIAGGTALIGSGLAAGYFLGG 60

Query 61 D 61
D
Sbjct 61 D 61

When I blasted the genomic.faa of that particular genomes against my local blast database built from the same .faa file I got quit some significant hits.

Do you see what I am missing here?

Many thanks for your input, suggestions!

The text was updated successfully, but these errors were encountered:

fconstancias mentioned this issue Feb 5, 2025

ERROR: User proteins file Fasta format not valid! oschwengers/bakta#361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--proteins` external database doesn't give excpected assignements #719

`--proteins` external database doesn't give excpected assignements #719

fconstancias commented Feb 5, 2025 •

edited

Loading

--proteins external database doesn't give excpected assignements #719

--proteins external database doesn't give excpected assignements #719

Comments

fconstancias commented Feb 5, 2025 • edited Loading

`--proteins` external database doesn't give excpected assignements #719

`--proteins` external database doesn't give excpected assignements #719

fconstancias commented Feb 5, 2025 •

edited

Loading