See here for the list of plastome annotation tools
See here for the list of mitome annotation tools
See here for the list of plasmidome annotation tools
Back to the knowledge page
year | Tool name | Publication | Type | Method | Organism | Nb citation (pubmed 2016) | Comments | Output Format |
---|---|---|---|---|---|---|---|---|
1991 | GRAIL | E. C. Uberbacher and R. J. Mural (1991), "Locat- ing protein-coding regions in human DNAsequences by a multiple sensor-neural network approach", Proc. Natl. Acad. Sci. USA,Vol. 88, pp. 11261- 11265. R. J. Mural, J. R. Einstein, X. Guan, R. C. Mann and E. C. Uberbacher(1992), "All Artificial Intelli- gence Approach to DNASequence Feature Recogni- tion", Trend in Biotechnology, 10, pp. 66 - 69. |
Ab initio (sensors + Neural network) | No longer supported | ||||
1991 | NetGene | Brunak, S., Engelbrecht, J., and Knudsen, S. (1991). Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65. | Ab initio | |||||
1992 | GeneID | Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157. | Ab initio | WAM, HMM, PD, AD, NN | ||||
1992 | GeneID+ | Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157. | Hybrid | WAM, HMM, PD, AD, NN | use information from protein sequence database searches | |||
1992 | SORFIND | Hutchinson, G. B., and Hayden, M. R. (1992) Nucleic Acids Res. 20, 3453–3462. | Abinitio | |||||
1993 | Genemark | Borodovsky and McIninch | Ab initio | |||||
1993 | Geneparser | Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 21: 607-613. | Ab initio | DP combined with a neural network program | ||||
1994 | GRAIL-II | Recognizing exons in genomic sequence using GRAIL II. Xu Y, Mural R, Shah M, Uberbacher E. Genet Eng (N Y). 1994; 16():241-53. | Ab initio | |||||
1994 | Xpound | Thomas,A. and Skolnick,M.H. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol., 11, 149–160. | Ab initio | |||||
1994 | EcoParse | Ab initio | HMM | Prokaryote | 393 | |||
1994 | GeneLang / GenLang | Dong, S. and Searls, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23: 540-551. | Ab initio | Linguistic method HMM, PD, WAM | Eukaryote | |||
1995 | Fgeneh (Find gene in human) / GeneFinder | Solovyev VV, Salamov AA, Lawrence CB (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. Proceedings/International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology 3: 367–375 | Ab initio | HMM, DP, LDA | Human | Finds single exon only | ||
1995 | Geneparser2 | Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18. | Ab initio | DP combined with a neural network program | ||||
1995 | Geneparser3 | Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18. | hybrid | DP combined with a neural network program | ||||
1996 | GeneHacker | Yada.T , Hirosawa.M DNA Res., 3, 335-361 (1996). Syst. Mol. Biol. pp.252-260 (1996). Syst. Mol. Biol. pp.354-357 (1997).. | ab initio | Markov model | Prokaryote | |||
1996 | Genie | Kulp, D.; Haussler, D.; Reese, M. G.; and Eeckman, F. H. 1996. A generalized hidden Markov model for the recognition of human genes in DNA. In D.J. States et al., ed., Proc. Conf. on Intelligent Systems in Molecular Biology, 134–142. Menlo Park, CA: AAAI Press. | Hybrid | GHMM + neural networks | ||||
1996 | Procrustes | Gene recognition via spliced sequence alignment. Gelfand MS, Mironov AA, Pevzner PA. Proc Natl Acad Sci U S A. 1996 Aug 20; 93(17):9061-6. | Evidence based | |||||
1997 | Fgenes / GeneFinder | Solovyev | Ab initio | HMM, DP, LDA | Human | |||
1997 | GenScan | Burge, C. (1997). Identification of genes in human genomic DNA. Ph.D. thesis, Stanford University. ; Burge, C. & Karlin, S. (1997). Prediction of complete gene structures in genomic DNA. Journal of Molecular Biology, 268,78–94 | Ab initio | GHMM | GENSCAN++ is a reimplementation of GENSCAN in C++ (~2001) | |||
1997 | MZEF | Identification of protein coding regions in the human genome by quadratic discriminant analysis. Zhang MQ. Proc Natl Acad Sci U S A. 1997 Jan 21; 94(2):565-8. | Quadratic discriminant analysis | |||||
1997 | HMMGene | Krogh A. Two methods for improving performace of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A, editors. The fifth international conference on intelligent Systems for Molecular Biology. CA: Menlo Park: AAAI Press; 1997. pp. 179–186. | Ab initio | CHMM | Vertebrate and C. elegans | No download version. Webserver. | ||
1997 | GeneWise (from Wise2 distribution) | unplublished. Birney, E. and Durbin, R. 1997. Wise2. http://www.sanger.ac.uk/Software/Wise2. | Evidence based | |||||
1997 | AAT (Analysis and Annotation Tool) | Huang et al. | Evidence based | Include two paris of programs DPS/NAP and DDS/GAP | ||||
1998 | Orpheus | Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26:2941–2947 | abinintio + evidence | Seed and extend | Prokaryote / Archaea | |||
1998 | SIM4 | A computer program for aligning a cDNA sequence with a genomic DNA sequence. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W. Genome Res. 1998 Sep; 8(9):967-74. | ||||||
1998 | GIN | Y. Cai and P. Bork, “Homology-based gene prediction using neural nets, Analytical Biochemistry, vol. 265, no. 2, pp. 269–274, 1998. | Hybrid | NN + homology | Vertebrate | |||
1998 | GAIA | GAIA: framework annotation of genomic sequence. Bailey LC Jr, Fischer S, Schug J, Crabtree J, Gibson M, Overton GC. Genome Res. 1998 Mar; 8(3):234-50. | homology-based | |||||
1998 | MORGAN (Multi-frame Optimal Rule-based Gene ANalyzer) | Salzberg S, Delcher AL, Fasman KH, Henderson J. J Comput Biol. 1998 Winter; 5(4):667-80. | Abinitio | DP algorithm in combination with a decision tree program | Hybrid tool combining decision trees with dynamic programming and signal sensor algorithm | |||
1998 | GeneMark.hmm | Lukashin, A. V & Borodovsky, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115 (1998). | Ab initio | HMM. Iteratively trains and improves the model in an unsupervised manner | Prokaryote / Archaea | 1334 | Self training | |
1998 | Glimmer | Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998b). Microbialgene identification using interpolated Markov models.Nucleic Ac-ids Res.26(2), 544 –548. | Abinitio | IMM | Prokartyote + Archaea | |||
1999 | Fgenesh | Solovyev and Salamov | HMM | programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode | ||||
1999 | GlimmerM | Salzberg,S.L., Pertea,M., Delcher,A.L., Gardner,M.J. and Tettelin,H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics, 59, 24–31. | Abinito | IMM | Small eukaryote | developed to find genes in the malaria parasite Plasmodium falciparum. | ||
1999 | Veil (the Viterbi Exon-Intron Locator) | Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman. This describes the VEIL system for finding genes. Journal of Computational Biology 4:2 (1997), 127-141. | HMM | Eukaryote | ||||
1999 | CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) | Badger and Olsen. Molecular Biology and Evolution, 16(4):512-524. 1999. | Comparative | Prokaryote / Archaea | Comparative analysis is based on amino acid sequence similarity to other species | |||
2000 | Fgenesh+ | Salamov AA, Solovyev VV Genome Res. 2000 Apr; 10(4):516-22.; Solovyev V.V. (2007) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D., Cannings C., Bishop M.), Wiley-Interscience; 3d edition, 1616 p. | HMM plus similar protein-based gene prediction | Fgenesh+ is a variant of Fgenesh that takes into account some information about similar proteins | ||||
2000 | Rosetta | Batzoglou et al., 2000 | Comparative genomics | Two genomes. Uses pairwise genomic alignments to find regions of homology; incorporates a splice junction and exon length model. | ||||
2000 | CEM | Bafna & Huson, 2000 | Comparative genomics | Two genomes | ||||
2001 | GenomeScan | Computational inference of homologous gene structures in the human genome. Yeh RF, Lim LP, Burge CB, Genome Res. 2001 May; 11(5):803-16. | Comparative | |||||
2001 | Eugene | Hybrid | Semi-Markov Conditional Random Fields / IMM, DP | Plant | Can be seen as a combiner because collect information about splice sites and ATG has to be done outside the program. | |||
2001 | Twinscan | Ian Korf, Paul Flicek, Daniel Duan, Michael R. Brent. Bioinformatics, Volume 17, Issue suppl_1, June 2001, Pages S140–S148, https://doi.org/10.1093/bioinformatics/17.suppl_1.S140 | comparative-genomics-based | Two genomes. Uses local alignments between a target genome and a reference (informant) genome to identify regions of conservation | ||||
2001 | GeneHacker Plus | Yada,T., Totoki,Y., Takagi,T. and Nakai,K. ( 2001 ) A novel bacterial gene‐finding system with improved accuracy in locating start codons. DNA Res. , 8 , 97 –106 | Ab initio | HMM | Prokaryote | 50 | ||
2001 | GeneMarkS | GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Besemer J, Lomsadze A, Borodovsky M. Nucleic Acids Res. 2001 Jun 15; 29(12):2607-18. | Ab initio | HMM | Prokaryote | 742 | Self training | |
2001 | SGP-1 (Syntenic Gene Prediction) | SGP-1: prediction and validation of homologous genes based on sequence alignments. Wiehe T, Gebauer-Jung S, Mitchell-Olds T, Guigó R. Genome Res. 2001 Sep; 11(9):1574-83. | Comparative | vertebrates and plants | Dual genomes. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci. | |||
2001 | Spidey | Spidey: a tool for mRNA-to-genomic alignments. Wheelan SJ, Church DM, Ostell JM. Genome Res. 2001 Nov; 11(11):1952-7. | ||||||
2002 | DOUBLESCAN | Meyer IMM, Durbin R. 2002. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–18 | comparative | PHMM | Uses a pair HMM to simultaneously predict gene structures and conservation in two aligned sequences | |||
2002 | AGenDA (Alignment-based Gene-Detection Algorithm) | Oliver Rinner and Burkhard Morgenstern. AGenDA: Gene Prediction by Com- parative Sequence Analysis. Silica Biology, 2:4673-4680, 2002. | comparative | Eukaryote | Based on pair-wise alignments created by CHAOS and DIALIGN | |||
2002 | GAZE | Howe, K. L. et al. GAZE : A Generic Framework for the Integration of Gene-Prediction Data by Dynamic Programming. 1418–1427 (2002). doi:10.1101/gr.149502 | Comparative / combiner | |||||
2002 | BDGF | Shibuya T, Rigoutsos I (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 30:2710–2725 | Evidence based | Prokaryote / Archaea | Classifications based on universal CDS-specific usage of short amino acid “seqlets” | |||
2003 | EvoGene | Pedersen JS, Hein J. 2003. Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–27 | Comparative / evolutionary | Evolutionary Hidden Markov Model (EHMM) | Phylogenetic HMM that performs ab initio prediction of genes across a multiple-sequence alignment (more than two genomes), making use of phylogenetic information | |||
2003 | GeneMarkS (virus version) | Mills R, Rozanov M, Lomsadze A, Tatusova T, Borodovsky M. Improving gene annotation of complete viral genomes. Nucleic Acids Res. 2003;31(23):7041–7055. doi: 10.1093/nar/gkg878. | Ab initio | HMM | Virus | 742 | Self training | |
2003 | GeneComber | Shah SP, McVicker GP, Mackworth AK, Rogic S, Ouellette BF. 2003. GeneComber: combining outputs of gene prediction programs for improved results. Bioinformatics 19:1296–97 | Combiner | EUI, GI and EUI frame algorithms | It runs Genscan and HMMgene and combines results | |||
2003 | AUGUSTUS | Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2, ii215–ii225 (2003). | abinitio | HMM | Eukaryote | |||
2003 | SLAM | M. Alexandersson, S. Cawley, and L. Pachter. 2003. SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res., 13:496-502. | Comparative | GPHMM (Generalized pair HMM) | Eukaryote | 187 | Dual genome. Treats two alignments in a symmetric way, predicting pairs of transcripts | |
2003 | SGP2 | G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett, and R. Guigo. 2003. Comparative gene prediction in human and mouse. Genome Res., 13:108-117 | comparative | Eukaryote | Dual genome. It integrates the sequence similarity search program TBLASTX (WU-BLAST) and the ab initio gene finder GeneiD. Used by the Mouse Genome Sequencing Consortium in 2002 to annotate the mouse genome. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci. | |||
2003 | PASA (Program to Assemble Spliced Alignments) | Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr., et al. 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res | pipeline - combiner - evidence-based | Uses alignments of cDNA, EST, or RNA-seq to predict gene structures, including alternative splice events. Can run GMAP and BLAT to do alignment. Can use external gff3 file | ||||
2003 | EasyGene | EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance. Larsen TS, Krogh A. BMC Bioinformatics. 2003 Jun 3; 4():21. | Ab initio | HMM, H | Prokaryote / Archaea | 153 | ||
2003 | AMIGene (Annotation of MIcrobial Genes) | Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., Medigue, C. 2003 AMIGene: Annotation of MIcrobial Genes Nucleic Acids Res. 31 3723 –3726 | Ab initio | HMM | Prokaryote | |||
2003 | ETOPE | Anton Nekrutenko, Wen-Yu Chung, Wen-Hsiung Li. Nucleic Acids Research, Volume 31, Issue 13, 1 July 2003, Pages 3564–3567, https://doi.org/10.1093/nar/gkg597 | Comparative / evolutionary | based on the ratio of non-synonymous to synonymous substitution rates between sequences from different genomes | Eukaryote | 20 | Based on Genscan output. It doesn't predict exons but rather validate exon predicted by other tools. | |
2003 | CRASA | A complexity reduction algorithm for analysis and annotation of large genomic sequences. Chuang TJ, Lin WC, Lee HC, Wang CW, Hsiao KL, Wang ZH, Shieh D, Lin SC, Ch'ang LY. Genome Res. 2003 Feb; 13(2):313-22. | ||||||
2003 | YACOP | Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3:441–451 | combiner: abinitio + evidence | Utilizes Glimmer, Critica and ZCURVE | Prokaryote / Archaea | |||
2003 | ZCurve | Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31:1780–1789 | abinitio | Z curve. correlation of dinucleotides. | Prokaryote / Archaea | Uses the “Z-transform” of DNA as the information source for classification | ||
2003 | Eugene'Hom | Foissac S, Bardou P, Moisan A, Cros M, Schiex T. EuGene'Hom: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 2003; 31: 3742-3745. | Evidence-based | Eukaryote | ||||
2004 | GeneWise | GeneWise and Genomewise. Birney E, Clamp M, Durbin R. Genome Res. 2004 May; 14(5):988-95. | Hybrid | HMM-based gene prediction tool using extrinsic evidence | ||||
2004 | Ensembl | Pipeline Evidence based | Pipeline | |||||
2004 | RescueNet | Mahony S, McInerney JO, Smith TJ, Golden A (2004) Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models. BMC Bioinformatics 5:23 | Ab initio, evidence | Prokaryote, Archaea | Unsupervised discovery of multiple gene classes using a self-organizing map. No exact start/stop prediction | |||
2004 | Reganor | McHardy AC, Goesmann A, Puhler A, Meyer F (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631 | combiner: abintito + evidence | Uses Glimmer and Critica | Prokaryote / Archaea | |||
2004 | Combiner | Allen, J.E., et al. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14142–148 | combiner | Linear Combiner that uses a voting function; statistical scoring method that uses decision trees | Three different algorithms for combining evidence in the Combiner were implemented | |||
2004 | GlimmerHMM | Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879. | Ab initio | GHMM | eukaryote | |||
2004 | GeneZilla (formerly "TIGRscan") | Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879. | Ab initio | GHMM | eukaryote | No longer supported | ||
2004 | SNAP (Semi-HMM-based Nucleic Acid Parser) | Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004) | Ab initio | semi-HMM | ||||
2004 | Projector | Meyer IM, Durbin R. 2004. Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32:776–83 | comparative | PHMM | Similar to DOUBLESCAN but extends the model to make use of annotation information on one sequence to inform the other | |||
2004 | ExoniPhy | Siepel A, Haussler D. 2004. Computational identification of evolutionarily conserved exons. In Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, ed.Gusfield D, Bourne P, Istrail S, Pevzner P, Waterman M, pp. 177–86. New York: Assoc. Comput. Mach. | Comparative / evolutionary | phylo-HMM | Phylogenetic HMM that performs ab initio predictions across a multiple-sequence alignment | |||
2005 | ExonHunter | Bronislava Drejova. Evidence Combination in Hidden Markov Models for Gene Prediction. PhD thesis, the University of Waterloo, 2005. Broii.a Brejova, Daniel G. Brown, Ming Li, and Tomas Vinaf. ExonHunter: a comprehensive approach to gene finding. Bioinformatics, 21 Suppl. 1:i57- i65, 2005. | Comparative + evidence driven | GHMM | use genomic sequences, expressed sequence tags and protein databases of related species | |||
2005 | JIGSAW | Jonathan E. Allen and Steven L. Salzberg. JIGSAW: Integration of Multiple Sources of Evidence for Gene Prediction. Bioinformatics, 21:3596- 3603, 2005. | Combiner | GHMM-like algorithm | 137 | select the prediction whose structure best represents the consensus | ||
2005 | AIR | Florea L, Di Francesco V, Miller J, Turner R, Yao A, et al. 2005. Gene and alternative splicing annotation with AIR. Genome Res. | evidence | Integrates multiple forms of extrinsic evidence to perform alternative splice junction prediction | ||||
2005 | GeneMark-ES | Lomsadze, A. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005) ; Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18, 1979–1990 (2008). | Ab initio | Eukaryote | 243 / 200 | |||
2005 | BGF (Beijing Gene Finder) | Li, H. et al. Test data sets and evaluation of gene prediction programs on the rice genome. J Comp Sci Tech 20, 446–453 (2005). | Ab initio | semi HMM | Plant (Eukaryote in general?) | |||
2005 | TWAIN | Majoros WH, Pertea M, Salzberg SL. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics. 2005;21(9):1782–1788. | comparative | GPHMM | Dual genome | |||
2005 | GenomeThreader | G. Gremme, V. Brendel, M.E. Sparks, and S. Kurtz. Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology, 47(15):965-978, 2005 | Evidence based | Similarity | All | The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments | ||
2006 | MaGe | Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006, 34 (1): 53-65. 10.1093/nar/gkj406. | Pipeline | Bacteria | AMIGene for protein coding, RBSfinder for ribosome,tRNAscan-SE for tRNA, Rfam for small RNAs and riboswitches,etc. | sort of gff3 (not fully compatible. Define only gene and CDS feature. Gene do not have ID and CDS do not have parent attributes but share locus_tag attribute) | ||
2006 | DOGFISH (for ‘detection of genomic features in sequence homologies’) | Carter D, Durbin R. 2006. Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol. 7(Suppl. 1):S6.1–12 | Comparative | HMM | vertebrate | Two-step program that combines a classifier that scores potential splice sites using a multiple-sequence alignment and an ab initio gene predictor that makes use of the scores from the classifier to predict gene structures. More than two genomes possible. | ||
2006 | AUGUSTUS+ | Stanke M, Schoffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:62. | Hybrid | GHMM or CRF | ||||
2006 | N-SCAN (a.k.a. TWINSCAN 3.0) | Annual International Conference on Research in Computational Molecular Biology RECOMB 2005: Research in Computational Molecular Biology pp 374-388 ; Gross SS, Brent MR. 2006. Using multiple alignments to improve gene prediction. J. Comput. Biol. 13:379–93. | Comparative | Can use more than 2 genomes (Extends the TWINSCAN model to N genomes) | ||||
2006 | ZCURVE_V | ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes. | ab initio | Z curve | Virus | self-training | ||
2006 | TWINSCAN_EST | Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327. | Comparative + Evidence | Two genome | ||||
2006 | N_Scan_EST | Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327. | Comparative + Evidence | HMM | HMM-based gene prediction tool that makes use of EST and genomic alignments, incorporating phylogenetic information | |||
2006 | Metagene | Noguchi, H., Park, J. & Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 34, 5623–5630 (2006) | ab initio | Metagenomic | 294 | |||
2006 | TiCo | An unsupervised classification scheme for improving predictions of prokaryotic TIS. Tech M, Meinicke P. BMC Bioinformatics. 2006 Mar 9; 7():121. | Prokaryote | clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation | ||||
2006 | FGENESH++ | Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 2006;7 Suppl 1:S10.1-12. | Pipeline | hybrid: hmm+extrinsinc | automated version of FGENESH+ | |||
2007 | Conrad | DeCaprio, D. et al. Conrad: gene prediction using conditional random fields. Genome Res. 17, 1389–1398 (2007). | comparative | semi-Markov conditional random fields (SMCRFs) | first comparative gene predictor based on SMCRFs. Can use more than 2 genomes | |||
2007 | Contrast | Gross SS, Do CB, Sirota M, Batzoglou S. 2007. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol. 8:R269. | Comparative | CRF,SVM. Combines local classifiers with the global gene structure model. | 90 | Can also incorporate information from EST alignment. Can use more than 2 genomes. Uses a combination of SVM and CRF predictors, providing a big boost over traditional HMMs | ||
2007 | GISMO | Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007) GISMO—gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35:540–549 | Abinitio + evidence | SVMs | Prokaryote | Uses SVMs. Model training is based on “reliable” genes found with PFAM protein domain HMMs. | GFF2 | |
2007 | Genomix | Coghlan, A. & Durbin, R. Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron–exon structure. Bioinformatics 23, 1468–1475 (2007). | combiner | DP | eukaryote | use dynamic programming to select the best conserved (top-scoring) predicted exons in the query region, and combine them into a gene structure | ||
2007 | GLEAN | Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007). | combiner | HMM | Eukaryote | use an unsupervised learning method | ||
2007 | FLAN | Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Tatusova T. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res. 2007. pp. W280–284. | similarity-based | Influenza virus | ||||
2007 | transMap | Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Stanke M, Diekhans M, Baertsch R, Haussler D. Bioinformatics. 2008 Mar 1; 24(5):637-44. Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. 2007. Comparative genomics search for losses of long-established genes on the human lineage. PLOS Comput. Biol. 3:e247. | Evidence | Eukaryote | Uses whole-genome alignments to project existing annotations from one genome to one or more other genomes. first developed in conjunction with improvements to AUGUSTUS to model extrinsic information | |||
2007 | GLIMMER3 | Delcher AL, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics , 2007, vol. 23 (pg. 673-679) | Abinitio | IMM | bacteria, archæa and viruses | It integrates Ribosome binding sites evidence directly into the gene-finding algorithm. It distinguishs host and endosymbiont DNA. | ||
2008 | SCGPred | SCGPred: a score-based method for gene structure prediction by combining multiple sources of evidence. Li X, Ren Q, Weng Y, Cai H, Zhu Y, Zhang Y Genomics Proteomics Bioinformatics. 2008 Dec; 6(3-4):175-85. | Combiner | Eukaryote | automated eukaryotic gene structure annonator that computes weighted consensus gene structure based on multiple sources of available evidence | |||
2008 | RAST | Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. The RAST Server: rapid annotations using subsystems technology, BMC Genomics , 2008, vol. 9 pg. 75 | pipeline | bacterial and archaeal | Online service that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. | |||
2008 | Maker | Cantarel, B. L. et al. Maker. Genome Res. 18, 188–96 (2008). | Combiner | 306 | It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap | |||
2008 | Evigan | Liu, Q., Mackey, A. J., Roos, D. S. & Pereira, F. C. N. Evigan: A hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics 24, 597–605 (2008). | Combiner | Dynamic Bayes networks (DBNs) | Eukaryote | 52 | Choose the best possible set of exons and combine them in a gene model. Weight of different sources. Unsupervised learning method | |
2008 | Y. Zhou, Y. Liang, C. Hu, L. Wang, X. Shi, An artificial neural network method for combining gene prediction based on equitable weights, NeuroComputing 71 (2007) 538–543 | combiner | RBFN | |||||
2008 | Evidence Modeler (EVM) | Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008). | Combiner | choose the best possible set of exons and combine them in a gene model weight of different sources. Evidence based chooser. | ||||
2008 | Chemgenome2.0 | Poonam Singhal, B. Jayaram, Surjit B. Dixit and David L. Beveridge. Prokaryotic Gene Finding based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations.Biophysical Journal,2008,Volume:94 Issue:11, 4173-4183 ] | Ab initio | Procaryote | Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations | |||
2008 | MetaGeneAnnotator (MGA) | Noguchi H, Taniguchi T, Itoh T (2008) Meta- GeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387–396. | abinitio | Prokaryote | MGA is a self-training gene prediction tool for all kinds of prokaryotic genes including atypical genes such as horizontally transferred and prophage-encoded genes | |||
2009 | mGene | Schweikert, G. et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19, 2133–43 (2009). | Ab initio | Structural HMM combined with discrimination training techniques similar to SVMs | 66 | No longer supported | ||
2009 | Orphelia | Ab initio | Neural network | Metagenomic | 78 | |||
2009 | MiGAP (Microbial Genome Annotation Pipeline ) | Sugawara H. et al. (2009) Microbial genome annotation pipeline (MiGAP) for diverse users. In: Proceedings of the 20th International Conference on Genome Informatics, Yokohama, Japan, S–001–1–2. | Pipeline: MetaGeneAnnotator + tRNAscan-SE + rRNA db | Prokaryote | ||||
2009 | DAWGPAWS | Estill, J. C. & Bennetzen, J. L. The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods 5, 1–11 (2009). | Eukaryote / Plant | pipeline for the annotation of genes and transposable elements in plant genomes | ||||
2010 | MetaGeneMark | Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, 1–15 (2010). | Ab initio | HMM | Metagenome | 220 | Self training | |
2010 | Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) | Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010) | Ab initio | dynamic programming + HMM. Log-likelihood coding statistics trained from data. | Prokaryote, Metagenome | Self training | ||
2010 | GenePRIMP | Pati A. Ivanova N.N. Mikhailova N. Ovchinnikova G. Hooper S.D. Lykidis A. Kyrpides N.C. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes Nat. Methods 2010 7 455457 | - | - | Prokaryote | evidence-based evaluation | ||
2010 | VIGOR (Viral Genome ORF Reader) | Wang S, Sundaram JP, Spiro D. 2010. VIGOR, an annotation program for small viral genomes. BMC Bioinformatics 11:451. http://dx.doi.org /10.1186/1471-2105-11-451. | Evidence | Virus: influenza virus, rotavirus, rhinovirus and coronavirus subtypes | Web application tool | |||
2010 | FragGeneScan | Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191 | ab initio | HMM | Metagenome | HMM-based. Combines sequencing error models with codon usage | ||
2010 | MetaGeneMark | Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38:e132 | abinitio | Metagenome | Update of GeneMark.hmm with improved model parameters for metagenomic samples | |||
2010 | Gnomon | Souvorov, A. et al. Gnomon — the NCBI eukaryotic gene prediction tool. National Center for Biotechnology Information, (2010). | Abinitio | HMM; Translational and splice signals are described using WMM and WAM models | Following the Genscan logic Gnomon recognizes as HMM states coding exons and introns on both strands and intergenic sequence | |||
2011 | MAKER2 | Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011). | Pipeline / combiner | Evidence or abinitio or abinitio evidence driven | Eukaryote / Prokaryote | 184 | It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap | |
2011 | GenSAS | Lee T, Peace C, Jung S, Zheng P, Main D, Cho I (2011) GenSAS: an online integrated genome sequence annotation pipeline. In: 4th International conference on biomedical engineering and informatics (BMEI), Shanghai, 2011, pp. 1967–1973. doi: 10.1109/BMEI.2011.6098712 | pipeline | An online integrated genome sequence annotation pipeline | ||||
2011 | VMGAP (TheViral MetaGenome Annotation Pipeline) | Lorenzi, H. A. et al. TheViral MetaGenome Annotation Pipeline(VMGAP): an automated tool for the functional annotation of viral metagenomic shotgun sequencing data. Stand. Genomic Sci. 4, 418–429 (2011). | Pipeline | Viruses | ||||
2012 | eCRAIG (ensemble CRAIG) | Bernal A, Crammer K, Pereira F: Automated gene-model curation using global discriminative learning. Bioinformatics. 2012, 28 (12): 1571-1578. 10.1093/bioinformatics/bts176. | combiner | CRF-based | 4 | |||
2012 | MOCAT | Kultima JR, Sunagawa S, Li J, Chen W, Chen H, Mende DR, Arumugam M, Pan Q, Liu B, Qin J (2012) MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7:e47656 | pipeline | Use Prodigal or MetaGeneMark | Metagenome | |||
2013 | GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads) | Zickmann F, Lindner MS, Renard BY (2013) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613 | abinitio evidence driven | maximum-flow approach | Eukaryote, Prokaryote | Based on the observed mapping coverage, GIIRA identifies candidate genes that are refined in further validating steps. | ||
2013 | Eugene-P | Next-generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011. E. Sallet et al. DNA Res. 2013 | Prokaryote | |||||
2013 | MetaGUN | Liu Y, Guo J, Hu G, Zhu H (2013) Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14:S12 | abinitio | SVM-based. Phylogenetic binning and assignment of protein sequences to each bin | Metagenome | |||
2014 | ZUPLS | Song, K., Tong, T., and Wu, F. (2014). Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6, 460–469. doi: 10.1039/c3ib40241j | ab initio | Z-curve | Prokaryote | |||
2014 | OMIGA (Optimized Maker-Based Insect Genome Annotation) | Liu J. Xiao H. Huang S. Li F. OMIGA: Optimized Maker-Based Insect Genome Annotation Mol. Genet. Genomics 2014 289 567 573 | pipeline (MAKER) | Augustus,Snap,GeneMark | Insect | |||
2014 | GeneMark-ET | Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. 42, 1–8 (2014). | Ab initio | HMM | Eukaryote | 10 | Self training | |
2014 | Prokka | Seemann T., Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063 | pipeline | Ab initio + evidence-based for functional annotation | prokaryote | https://github.com/tseemann/prokka Do structural and functional annotation | .gff, .gbk, .fna, .faa, .ffn, .sqn, .fsa, .tbl, .err, .log, .txt, .tsv | |
2014 | DFAST (DDBJ Fast Annotation and Submission Tool) | Seemann T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics , 30, 2068–2069. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Yasuhiro Tanizawa, Takatomo Fujisawa, Yasukazu Nakamura. Bioinformatics, Volume 34, Issue 6, 15 March 2018, Pages 1037–1039 | Pipeline | Prokaryote | The original version of DFAST employs the lightweight command-line program Prokka as an annotation engine. Now DFAST uses MetaGeneAnnotator (MGA) by default to predict CDSs and GHOSTX as a default aligner. Standalone and web versions. | Data ready to submit to DDBJ | ||
2015 | Ipred | Zickmann, F. & Renard, B. Y. IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy. BMC Genomics 16, 134 (2015). | Combiner evidence-based | choose the best possible set of exons and combine them in a gene model. Evidence based chooser. Can also model gene form evidence only. | ||||
2015 | GASS ( Genome Annotation based on Species Similarity) | GASS: genome structural annotation for Eukaryotes based on species similarity. Wang Y, Chen L, Song N, Lei X. BMC Genomics. 2015 Mar 4; 16():150. | comparative | shortest path model and DP | ||||
2016 | BRAKER1 | Lange, S., Hof, K., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1 : Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. 24, 2014 (2014). | Hybrid / Pipeline | 1 | Pipeline: GeneMark-ET + Augustus | |||
2016 | Companion | Steinbiss, S. et al. Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Res. 44, W29–W34 (2016) | Pipeline for automatic eukaryotic parasite annotation | |||||
2016 | Gmove | Dubarry, M. et al. Gmove a tool for eukaryotic gene predictions using various evidences. F1000Reserach 34, 2011 (2016). | Eukaryote | |||||
2016 | AugustusCGP | König S, Romoth LW, Gerischer L, Stanke M. Bioinformatics. 2016 Nov 15; 32(22):3388-3395. | comparative | Eukaryote | mutiple genomes | |||
2016 | CAT Comparative Analysis toolkit | Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. https://doi.org/10.1101/gr.233460.117 (2018). | pipeline | Evidence based, comparative-abinitio (AugustusCGP) | takes as input a HAL-format multiple whole genome alignment. | GFF3 + many plots | ||
2016 | CESAR | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Sharma V, Elghafari A, Hiller M. Nucleic Acids Res. 2016 Jun 20; 44(11):e103. | comparative | Uses a HMM to adjust splice sites in whole-genome alignments, improving transcript projections | ||||
2016 | PGAP | Tatusova T. et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res ., 44, 6614–6624. | pipeline | GenemarkS+ Glimmer + extrinsec data | Prokaryote | This is the NCBI annotation service incorporated in its submission system, but it is only available for GenBank submitters. | ||
2017 | GeMoMa | Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants , animals and fungi. (2017). | homology-based gene prediction program | |||||
2017 | funannotate | doi.org/10.5281/zenodo.2576527 | Pipeline | Evidence Modeler + Augustus + GeneMark-ES/ET + evidence + PASA | built specifically for fungi, but will also work with higher eukaryotes | homology-based gene prediction program | ||
2017 | GAWN | unpublished - https://github.com/enormandeau/gawn | pipeline evidence-based only | GMAP to create gene and cufflinks and TransDecoder to add UTR | eukaryote | |||
2018 | FunGAP | Min B, Grigoriev IV, Choi IG. FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation. Bioinformatics (Oxford, England). 2017;33(18):2936–7. | pipeline | |||||
2018 | BRAKER2 | Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M., & Borodovsky, M. (2021). BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics, 3(1), 1–11. https://doi.org/10.1093/nargab/lqaa108 | Hybrid | eukaryote | ||||
2018 | G-OnRamp | Ab-initio Web-based | Augustus,GlimmerHMM,SNAP | |||||
2018 | VIRULIGN | VIRULIGN: fast codon-correct alignment and annotation of viral genomes. Pieter J K Libin, Koen Deforche, Ana B Abecasis, Kristof Theys. Bioinformatics, Volume 35, Issue 10, 15 May 2019, Pages 1763–1765, https://doi.org/10.1093/bioinformatics/bty851 | Similarity | Virus | ||||
2019 | GAAP | Jinhwa Kong, Sun Huh, Jung-Im Won, Jeehee Yoon, Baeksop Kim, and Kiyong Kim. GAAP: A Genome Assembly + Annotation Pipeline. BioMed Research International, Volume 2019, Article ID 4767354, 12 pages | pipeline | Augustus,EVM,MAKER,PASA | Genome Assembly + Annotation Pipeline | |||
2019 | VAPiD (Viral Annotation Pipeline and iDentification) | Ryan C. Shean, Negar Makhsous, Graham D. Stoddard, Michelle J. Lin & Alexander L. Greninger. VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformaticsvolume 20, Article number: 48 (2019) | pipeline | Virus | ||||
2019 | Vgas (Viral Genome Annotation System) | Kai-Yue Zhang, Yi-Zhou Gao, Meng-Ze Du, Shuo Liu, Chuan Dong, and Feng-Biao Guo. Vgas: A Viral Genome Annotation System. Front Microbiol. 2019; 10: 184. | abinitio + similarity-based | ZCURVE_V + BLASTp | Virus | In their paper they say: When combining Vgas with GeneMarkS and Prodigal, better prediction results could be obtained than with each of the three individual programs. | ||
2020 | VADR | VADR: validation and annotation of virus sequence submissions to GenBank. Alejandro A. Schäffer, Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi & Eric P. Nawrocki. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3 | HMM + similarity | Virus | ||||
2021 | MOSGA | Martin, R., Hackl, T., Hattab, G., Fischer, M. G., & Heider, D. (2021). MOSGA: Modular Open-Source Genome Annotator. Bioinformatics, 36(22–23), 5514–5515. https://doi.org/10.1093/bioinformatics/btaa1003 | Ab initio, Hybrid, | Pipeline Framework | Eukaryote | Web interface, RNA-Seq/Proteins/Orthology based prediction possible + validation | GFF3 + Sequin | |
2021 | TSEBRA | Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M., & Stanke, M. (2021). TSEBRA: transcript selector for BRAKER. BMC Bioinformatics, 22(1), 566. https://doi.org/10.1186/s12859-021-04482-0 | Hybrid | Eukaryote | RNA-Seq + Proteins | GFF3 | ||
year | Tool name | Publication | Type | Method | Organism | Nb citation (pubmed 2016) | Comments | Output Format |
Legend:
Hybrid = ab initio and evidence based = HMM-based gene prediction tool using extrinsic evidence
Comparative = genome sequence comparison
CHMM: class HMM
CRF: conditional random field; HMM
DBN: Dynamic Bayes network
DP: dynamic programming
EHMM: evolutionary HMM
GHMM: generalized HMM
GPHMM: generalized pair HMM
HMM: hidden Markov model
IMM: Interpolated Markov model
LDA: Linear Discriminant Analysis
MDD: maximal dependence decomposition
ML: maximum likelihood
MM: Markov Model
NN: Neural Networks
PHMM: pair HMM
phyloHMM: phylogenetic HMM
RBFN: Radial Basis Function Network
SVM: support vector machine
WAM: weight array matrix
Interesting publications
Rogic, S., Mackworth, A. K., & Ouellette, F. B. (2001). Evaluation of gene-finding programs on mammalian sequences. Genome research, 11(5), 817-32.
Goodswen, S. J., Kennedy, P. J., & Ellis, J. T. (2012). Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques. PloS one, 7(11), e50609.
Chowdhury, B., Garai, A., & Garai, G. (2017). An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm. BMC bioinformatics, 18(1), 460. doi:10.1186/s12859-017-1874-7
Joel Armstrong, Ian T. Fiddes, Mark Diekhans and Benedict Paten. Whole-Genome Alignment and Comparative Annotation.Annu Rev Anim Biosci. 2019 Feb 15; 7: 41–64.
Alice Carolyn McHardy Andreas Kloetgen. Finding Genes in Genome Sequence. Bioinformatics pp 271-291
Bączkowski, K., Mackiewicz, K., Kowalczuk, M., Banaszak, J. and Cebrat, S., “Od sekwencji do funkcji– poszukiwanie genów i ich adnotacje,” Biotechnologia 3(70), 22–44 (2005)
Pirovano, W., Boetzer, M., Derks, M. F. L., & Smit, S. (2017). NCBI-compliant genome submissions: Tips and tricks to save time and money. Briefings in Bioinformatics, 18(2), 179–182. https://doi.org/10.1093/bib/bbv104
Interesting books
Principles of Gene Manipulation and Genomics. De Sandy B. Primrose, Richard Twyman