Skip to content

Latest commit

 

History

History
211 lines (194 loc) · 44.9 KB

annotation_tools_genome.md

File metadata and controls

211 lines (194 loc) · 44.9 KB

List of genome annotation tools

See here for the list of plastome annotation tools
See here for the list of mitome annotation tools
See here for the list of plasmidome annotation tools
Back to the knowledge page

year Tool name Publication Type Method Organism Nb citation (pubmed 2016) Comments Output Format
1991 GRAIL E. C. Uberbacher and R. J. Mural (1991), "Locat- ing protein-coding regions in human DNAsequences by a multiple sensor-neural network approach", Proc. Natl. Acad. Sci. USA,Vol. 88, pp. 11261- 11265.
R. J. Mural, J. R. Einstein, X. Guan, R. C. Mann and E. C. Uberbacher(1992), "All Artificial Intelli- gence Approach to DNASequence Feature Recogni- tion", Trend in Biotechnology, 10, pp. 66 - 69.
Ab initio (sensors + Neural network) No longer supported
1991 NetGene Brunak, S., Engelbrecht, J., and Knudsen, S. (1991). Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65. Ab initio
1992 GeneID Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157. Ab initio WAM, HMM, PD, AD, NN
1992 GeneID+ Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157. Hybrid WAM, HMM, PD, AD, NN use information from protein sequence database searches
1992 SORFIND Hutchinson, G. B., and Hayden, M. R. (1992) Nucleic Acids Res. 20, 3453–3462. Abinitio
1993 Genemark Borodovsky and McIninch Ab initio
1993 Geneparser Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 21: 607-613. Ab initio DP combined with a neural network program
1994 GRAIL-II Recognizing exons in genomic sequence using GRAIL II. Xu Y, Mural R, Shah M, Uberbacher E. Genet Eng (N Y). 1994; 16():241-53. Ab initio  
1994 Xpound Thomas,A. and Skolnick,M.H. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol., 11, 149–160. Ab initio  
1994 EcoParse Ab initio HMM Prokaryote 393
1994 GeneLang / GenLang Dong, S. and Searls, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23: 540-551. Ab initio Linguistic method HMM, PD, WAM Eukaryote
1995 Fgeneh (Find gene in human) / GeneFinder Solovyev VV, Salamov AA, Lawrence CB (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. Proceedings/International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology 3: 367–375 Ab initio HMM, DP, LDA Human Finds single exon only
1995 Geneparser2 Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18. Ab initio DP combined with a neural network program
1995 Geneparser3 Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18. hybrid DP combined with a neural network program
1996 GeneHacker Yada.T , Hirosawa.M DNA Res., 3, 335-361 (1996). Syst. Mol. Biol. pp.252-260 (1996). Syst. Mol. Biol. pp.354-357 (1997)..  ab initio Markov model Prokaryote
1996 Genie Kulp, D.; Haussler, D.; Reese, M. G.; and Eeckman, F. H. 1996. A generalized hidden Markov model for the recognition of human genes in DNA. In D.J. States et al., ed., Proc. Conf. on Intelligent Systems in Molecular Biology, 134–142. Menlo Park, CA: AAAI Press.  Hybrid GHMM + neural networks
1996 Procrustes Gene recognition via spliced sequence alignment. Gelfand MS, Mironov AA, Pevzner PA. Proc Natl Acad Sci U S A. 1996 Aug 20; 93(17):9061-6. Evidence based
1997 Fgenes / GeneFinder Solovyev  Ab initio HMM, DP, LDA Human
1997 GenScan Burge, C. (1997). Identification of genes in human genomic DNA. Ph.D. thesis, Stanford University. ; Burge, C. & Karlin, S. (1997). Prediction of complete gene structures in genomic DNA. Journal of Molecular Biology, 268,78–94 Ab initio GHMM GENSCAN++ is a reimplementation of GENSCAN in C++ (~2001)
1997 MZEF Identification of protein coding regions in the human genome by quadratic discriminant analysis. Zhang MQ. Proc Natl Acad Sci U S A. 1997 Jan 21; 94(2):565-8. Quadratic discriminant analysis
1997 HMMGene Krogh A. Two methods for improving performace of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A, editors. The fifth international conference on intelligent Systems for Molecular Biology. CA: Menlo Park: AAAI Press; 1997. pp. 179–186. Ab initio CHMM Vertebrate and C. elegans   No download version. Webserver.
1997 GeneWise (from Wise2 distribution) unplublished. Birney, E. and Durbin, R. 1997. Wise2. http://www.sanger.ac.uk/Software/Wise2. Evidence based
1997 AAT (Analysis and Annotation Tool) Huang et al. Evidence based Include two paris of programs DPS/NAP and DDS/GAP
1998 Orpheus Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26:2941–2947 abinintio + evidence Seed and extend Prokaryote / Archaea
1998 SIM4 A computer program for aligning a cDNA sequence with a genomic DNA sequence. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W. Genome Res. 1998 Sep; 8(9):967-74.
1998 GIN Y. Cai and P. Bork, “Homology-based gene prediction using neural nets, Analytical Biochemistry, vol. 265, no. 2, pp. 269–274, 1998. Hybrid NN + homology Vertebrate
1998 GAIA GAIA: framework annotation of genomic sequence. Bailey LC Jr, Fischer S, Schug J, Crabtree J, Gibson M, Overton GC. Genome Res. 1998 Mar; 8(3):234-50. homology-based
1998 MORGAN (Multi-frame Optimal Rule-based Gene ANalyzer) Salzberg S, Delcher AL, Fasman KH, Henderson J. J Comput Biol. 1998 Winter; 5(4):667-80. Abinitio DP algorithm in combination with a decision tree program Hybrid tool combining decision trees with dynamic programming and signal sensor algorithm
1998 GeneMark.hmm Lukashin, A. V & Borodovsky, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115 (1998). Ab initio HMM. Iteratively trains and improves the model in an unsupervised manner Prokaryote / Archaea 1334 Self training
1998 Glimmer Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998b). Microbialgene identification using interpolated Markov models.Nucleic Ac-ids Res.26(2), 544 –548. Abinitio IMM Prokartyote + Archaea
1999 Fgenesh Solovyev and Salamov HMM programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode
1999 GlimmerM Salzberg,S.L., Pertea,M., Delcher,A.L., Gardner,M.J. and Tettelin,H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics, 59, 24–31. Abinito IMM Small eukaryote developed to find genes in the malaria parasite Plasmodium falciparum.
1999 Veil (the Viterbi Exon-Intron Locator) Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman. This describes the VEIL system for finding genes. Journal of Computational Biology 4:2 (1997), 127-141. HMM Eukaryote
1999 CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) Badger and Olsen. Molecular Biology and Evolution, 16(4):512-524. 1999. Comparative Prokaryote / Archaea Comparative analysis is based on amino acid sequence similarity to other species
2000 Fgenesh+ Salamov AA, Solovyev VV Genome Res. 2000 Apr; 10(4):516-22.; Solovyev V.V. (2007) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D., Cannings C., Bishop M.), Wiley-Interscience; 3d edition, 1616 p. HMM plus similar protein-based gene prediction Fgenesh+ is a variant of Fgenesh that takes into account some information about similar proteins
2000 Rosetta Batzoglou et al., 2000 Comparative genomics Two genomes. Uses pairwise genomic alignments to find regions of homology; incorporates a splice junction and exon length model.
2000 CEM Bafna & Huson, 2000 Comparative genomics Two genomes
2001 GenomeScan Computational inference of homologous gene structures in the human genome. Yeh RF, Lim LP, Burge CB, Genome Res. 2001 May; 11(5):803-16. Comparative
2001 Eugene Hybrid Semi-Markov Conditional Random Fields / IMM, DP Plant Can be seen as a combiner because collect information about splice sites and ATG has to be done outside the program.
2001 Twinscan Ian Korf, Paul Flicek, Daniel Duan, Michael R. Brent. Bioinformatics, Volume 17, Issue suppl_1, June 2001, Pages S140–S148, https://doi.org/10.1093/bioinformatics/17.suppl_1.S140 comparative-genomics-based Two genomes. Uses local alignments between a target genome and a reference (informant) genome to identify regions of conservation
2001 GeneHacker Plus Yada,T., Totoki,Y., Takagi,T. and Nakai,K. ( 2001 ) A novel bacterial gene‐finding system with improved accuracy in locating start codons. DNA Res. , 8 , 97 –106 Ab initio HMM Prokaryote 50
2001 GeneMarkS GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Besemer J, Lomsadze A, Borodovsky M. Nucleic Acids Res. 2001 Jun 15; 29(12):2607-18. Ab initio HMM Prokaryote 742 Self training
2001 SGP-1 (Syntenic Gene Prediction) SGP-1: prediction and validation of homologous genes based on sequence alignments. Wiehe T, Gebauer-Jung S, Mitchell-Olds T, Guigó R. Genome Res. 2001 Sep; 11(9):1574-83. Comparative vertebrates and plants Dual genomes. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci.
2001 Spidey Spidey: a tool for mRNA-to-genomic alignments. Wheelan SJ, Church DM, Ostell JM. Genome Res. 2001 Nov; 11(11):1952-7.
2002 DOUBLESCAN Meyer IMM, Durbin R. 2002. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–18 comparative PHMM Uses a pair HMM to simultaneously predict gene structures and conservation in two aligned sequences
2002 AGenDA (Alignment-based Gene-Detection Algorithm) Oliver Rinner and Burkhard Morgenstern. AGenDA: Gene Prediction by Com- parative Sequence Analysis. Silica Biology, 2:4673-4680, 2002. comparative Eukaryote Based on pair-wise alignments created by CHAOS and DIALIGN
2002 GAZE Howe, K. L. et al. GAZE : A Generic Framework for the Integration of Gene-Prediction Data by Dynamic Programming. 1418–1427 (2002). doi:10.1101/gr.149502  Comparative / combiner
2002 BDGF Shibuya T, Rigoutsos I (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 30:2710–2725  Evidence based Prokaryote / Archaea Classifications based on universal CDS-specific usage of short amino acid “seqlets”
2003 EvoGene Pedersen JS, Hein J. 2003. Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–27 Comparative / evolutionary Evolutionary Hidden Markov Model (EHMM) Phylogenetic HMM that performs ab initio prediction of genes across a multiple-sequence alignment (more than two genomes), making use of phylogenetic information
2003 GeneMarkS (virus version) Mills R, Rozanov M, Lomsadze A, Tatusova T, Borodovsky M. Improving gene annotation of complete viral genomes. Nucleic Acids Res. 2003;31(23):7041–7055. doi: 10.1093/nar/gkg878. Ab initio HMM Virus 742 Self training
2003 GeneComber Shah SP, McVicker GP, Mackworth AK, Rogic S, Ouellette BF. 2003. GeneComber: combining outputs of gene prediction programs for improved results. Bioinformatics 19:1296–97 Combiner EUI, GI and EUI frame algorithms It runs Genscan and HMMgene and combines results
2003 AUGUSTUS Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2, ii215–ii225 (2003). abinitio HMM Eukaryote
2003 SLAM M. Alexandersson, S. Cawley, and L. Pachter. 2003. SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res., 13:496-502. Comparative GPHMM (Generalized pair HMM) Eukaryote 187 Dual genome. Treats two alignments in a symmetric way, predicting pairs of transcripts
2003 SGP2 G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett, and R. Guigo. 2003. Comparative gene prediction in human and mouse. Genome Res., 13:108-117 comparative Eukaryote Dual genome. It integrates the sequence similarity search program TBLASTX (WU-BLAST) and the ab initio gene finder GeneiD. Used by the Mouse Genome Sequencing Consortium in 2002 to annotate the mouse genome. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci.
2003 PASA (Program to Assemble Spliced Alignments) Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr., et al. 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res pipeline - combiner - evidence-based Uses alignments of cDNA, EST, or RNA-seq to predict gene structures, including alternative splice events. Can run GMAP and BLAT to do alignment. Can use external gff3 file
2003 EasyGene EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance. Larsen TS, Krogh A. BMC Bioinformatics. 2003 Jun 3; 4():21. Ab initio HMM, H Prokaryote / Archaea 153
2003 AMIGene (Annotation of MIcrobial Genes) Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., Medigue, C. 2003 AMIGene: Annotation of MIcrobial Genes Nucleic Acids Res. 31 3723 –3726 Ab initio HMM Prokaryote
2003 ETOPE Anton Nekrutenko, Wen-Yu Chung, Wen-Hsiung Li. Nucleic Acids Research, Volume 31, Issue 13, 1 July 2003, Pages 3564–3567, https://doi.org/10.1093/nar/gkg597 Comparative / evolutionary based on the ratio of non-synonymous to synonymous substitution rates between sequences from different genomes Eukaryote 20 Based on Genscan output. It doesn't predict exons but rather validate exon predicted by other tools.
2003 CRASA A complexity reduction algorithm for analysis and annotation of large genomic sequences. Chuang TJ, Lin WC, Lee HC, Wang CW, Hsiao KL, Wang ZH, Shieh D, Lin SC, Ch'ang LY. Genome Res. 2003 Feb; 13(2):313-22.
2003 YACOP Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3:441–451 combiner: abinitio + evidence Utilizes Glimmer, Critica and ZCURVE Prokaryote / Archaea
2003 ZCurve Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31:1780–1789 abinitio Z curve. correlation of dinucleotides. Prokaryote / Archaea Uses the “Z-transform” of DNA as the information source for classification
2003 Eugene'Hom Foissac S, Bardou P, Moisan A, Cros M, Schiex T. EuGene'Hom: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 2003; 31: 3742-3745. Evidence-based Eukaryote
2004 GeneWise GeneWise and Genomewise. Birney E, Clamp M, Durbin R. Genome Res. 2004 May; 14(5):988-95. Hybrid HMM-based gene prediction tool using extrinsic evidence
2004 Ensembl Pipeline Evidence based Pipeline
2004 RescueNet Mahony S, McInerney JO, Smith TJ, Golden A (2004) Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models. BMC Bioinformatics 5:23 Ab initio, evidence Prokaryote, Archaea Unsupervised discovery of multiple gene classes using a self-organizing map. No exact start/stop prediction
2004 Reganor McHardy AC, Goesmann A, Puhler A, Meyer F (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631 combiner: abintito + evidence Uses Glimmer and Critica Prokaryote / Archaea
2004 Combiner Allen, J.E., et al. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14142–148 combiner Linear Combiner that uses a voting function; statistical scoring method that uses decision trees Three different algorithms for combining evidence in the Combiner were implemented
2004 GlimmerHMM Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879. Ab initio GHMM eukaryote
2004 GeneZilla (formerly "TIGRscan") Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879. Ab initio GHMM eukaryote No longer supported
2004 SNAP (Semi-HMM-based Nucleic Acid Parser) Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004) Ab initio semi-HMM
2004 Projector Meyer IM, Durbin R. 2004. Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32:776–83 comparative PHMM Similar to DOUBLESCAN but extends the model to make use of annotation information on one sequence to inform the other
2004 ExoniPhy Siepel A, Haussler D. 2004. Computational identification of evolutionarily conserved exons. In Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, ed.Gusfield D, Bourne P, Istrail S, Pevzner P, Waterman M, pp. 177–86. New York: Assoc. Comput. Mach. Comparative / evolutionary phylo-HMM Phylogenetic HMM that performs ab initio predictions across a multiple-sequence alignment
2005 ExonHunter Bronislava Drejova. Evidence Combination in Hidden Markov Models for Gene Prediction. PhD thesis, the University of Waterloo, 2005. Broii.a Brejova, Daniel G. Brown, Ming Li, and Tomas Vinaf. ExonHunter: a comprehensive approach to gene finding. Bioinformatics, 21 Suppl. 1:i57- i65, 2005. Comparative + evidence driven GHMM use genomic sequences, expressed sequence tags and protein databases of related species
2005 JIGSAW Jonathan E. Allen and Steven L. Salzberg. JIGSAW: Integration of Multiple Sources of Evidence for Gene Prediction. Bioinformatics, 21:3596- 3603, 2005. Combiner GHMM-like algorithm 137 select the prediction whose structure best represents the consensus
2005 AIR Florea L, Di Francesco V, Miller J, Turner R, Yao A, et al. 2005. Gene and alternative splicing annotation with AIR. Genome Res. evidence Integrates multiple forms of extrinsic evidence to perform alternative splice junction prediction
2005 GeneMark-ES Lomsadze, A. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005) ; Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18, 1979–1990 (2008). Ab initio Eukaryote 243 / 200
2005 BGF (Beijing Gene Finder) Li, H. et al. Test data sets and evaluation of gene prediction programs on the rice genome. J Comp Sci Tech 20, 446–453 (2005). Ab initio semi HMM Plant (Eukaryote in general?)
2005 TWAIN Majoros WH, Pertea M, Salzberg SL. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics. 2005;21(9):1782–1788. comparative GPHMM Dual genome
2005 GenomeThreader G. Gremme, V. Brendel, M.E. Sparks, and S. Kurtz. Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology, 47(15):965-978, 2005 Evidence based Similarity All The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments
2006 MaGe Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006, 34 (1): 53-65. 10.1093/nar/gkj406. Pipeline Bacteria AMIGene for protein coding, RBSfinder for ribosome,tRNAscan-SE for tRNA, Rfam for small RNAs and riboswitches,etc. sort of gff3 (not fully compatible. Define only gene and CDS feature. Gene do not have ID and CDS do not have parent attributes but share locus_tag attribute)
2006 DOGFISH (for ‘detection of genomic features in sequence homologies’) Carter D, Durbin R. 2006. Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol. 7(Suppl. 1):S6.1–12 Comparative HMM vertebrate Two-step program that combines a classifier that scores potential splice sites using a multiple-sequence alignment and an ab initio gene predictor that makes use of the scores from the classifier to predict gene structures. More than two genomes possible.
2006 AUGUSTUS+ Stanke M, Schoffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:62. Hybrid GHMM or CRF
2006 N-SCAN (a.k.a. TWINSCAN 3.0) Annual International Conference on Research in Computational Molecular Biology RECOMB 2005: Research in Computational Molecular Biology pp 374-388 ; Gross SS, Brent MR. 2006. Using multiple alignments to improve gene prediction. J. Comput. Biol. 13:379–93. Comparative Can use more than 2 genomes (Extends the TWINSCAN model to N genomes)
2006 ZCURVE_V ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes. ab initio Z curve Virus self-training
2006 TWINSCAN_EST Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327. Comparative + Evidence   Two genome
2006 N_Scan_EST Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327. Comparative + Evidence HMM   HMM-based gene prediction tool that makes use of EST and genomic alignments, incorporating phylogenetic information
2006 Metagene Noguchi, H., Park, J. & Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 34, 5623–5630 (2006) ab initio Metagenomic 294
2006 TiCo An unsupervised classification scheme for improving predictions of prokaryotic TIS. Tech M, Meinicke P. BMC Bioinformatics. 2006 Mar 9; 7():121. Prokaryote clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation
2006 FGENESH++ Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 2006;7 Suppl 1:S10.1-12. Pipeline hybrid: hmm+extrinsinc automated version of FGENESH+
2007 Conrad DeCaprio, D. et al. Conrad: gene prediction using conditional random fields. Genome Res. 17, 1389–1398 (2007). comparative semi-Markov conditional random fields (SMCRFs) first comparative gene predictor based on SMCRFs. Can use more than 2 genomes
2007 Contrast Gross SS, Do CB, Sirota M, Batzoglou S. 2007. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol. 8:R269. Comparative CRF,SVM. Combines local classifiers with the global gene structure model. 90 Can also incorporate information from EST alignment. Can use more than 2 genomes. Uses a combination of SVM and CRF predictors, providing a big boost over traditional HMMs
2007 GISMO Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007) GISMO—gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35:540–549 Abinitio + evidence SVMs Prokaryote Uses SVMs. Model training is based on “reliable” genes found with PFAM protein domain HMMs. GFF2
2007 Genomix Coghlan, A. & Durbin, R. Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron–exon structure. Bioinformatics 23, 1468–1475 (2007). combiner DP eukaryote use dynamic programming to select the best conserved (top-scoring) predicted exons in the query region, and combine them into a gene structure
2007 GLEAN Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007). combiner HMM Eukaryote use an unsupervised learning method
2007 FLAN Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Tatusova T. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res. 2007. pp. W280–284. similarity-based Influenza virus
2007 transMap Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Stanke M, Diekhans M, Baertsch R, Haussler D. Bioinformatics. 2008 Mar 1; 24(5):637-44. Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. 2007. Comparative genomics search for losses of long-established genes on the human lineage. PLOS Comput. Biol. 3:e247. Evidence Eukaryote Uses whole-genome alignments to project existing annotations from one genome to one or more other genomes. first developed in conjunction with improvements to AUGUSTUS to model extrinsic information
2007 GLIMMER3 Delcher AL, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics , 2007, vol. 23 (pg. 673-679) Abinitio IMM bacteria, archæa and viruses It integrates Ribosome binding sites evidence directly into the gene-finding algorithm. It distinguishs host and endosymbiont DNA.
2008 SCGPred SCGPred: a score-based method for gene structure prediction by combining multiple sources of evidence. Li X, Ren Q, Weng Y, Cai H, Zhu Y, Zhang Y Genomics Proteomics Bioinformatics. 2008 Dec; 6(3-4):175-85. Combiner Eukaryote automated eukaryotic gene structure annonator that computes weighted consensus gene structure based on multiple sources of available evidence
2008 RAST Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. The RAST Server: rapid annotations using subsystems technology, BMC Genomics , 2008, vol. 9 pg. 75 pipeline bacterial and archaeal Online service that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
2008 Maker Cantarel, B. L. et al. Maker. Genome Res. 18, 188–96 (2008). Combiner 306 It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap
2008 Evigan Liu, Q., Mackey, A. J., Roos, D. S. & Pereira, F. C. N. Evigan: A hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics 24, 597–605 (2008). Combiner Dynamic Bayes networks (DBNs) Eukaryote 52 Choose the best possible set of exons and combine them in a gene model. Weight of different sources. Unsupervised learning method
2008 Y. Zhou, Y. Liang, C. Hu, L. Wang, X. Shi, An artificial neural network method for combining gene prediction based on equitable weights, NeuroComputing 71 (2007) 538–543 combiner RBFN
2008 Evidence Modeler (EVM) Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008). Combiner choose the best possible set of exons and combine them in a gene model weight of different sources. Evidence based chooser.
2008 Chemgenome2.0 Poonam Singhal, B. Jayaram, Surjit B. Dixit and David L. Beveridge. Prokaryotic Gene Finding based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations.Biophysical Journal,2008,Volume:94 Issue:11, 4173-4183 ] Ab initio Procaryote Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations
2008 MetaGeneAnnotator (MGA) Noguchi H, Taniguchi T, Itoh T (2008) Meta- GeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387–396. abinitio Prokaryote MGA is a self-training gene prediction tool for all kinds of prokaryotic genes including atypical genes such as horizontally transferred and prophage-encoded genes
2009 mGene Schweikert, G. et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19, 2133–43 (2009). Ab initio Structural HMM combined with discrimination training techniques similar to SVMs 66 No longer supported
2009 Orphelia Ab initio Neural network Metagenomic 78
2009 MiGAP (Microbial Genome Annotation Pipeline ) Sugawara H. et al. (2009) Microbial genome annotation pipeline (MiGAP) for diverse users. In: Proceedings of the 20th International Conference on Genome Informatics, Yokohama, Japan, S–001–1–2. Pipeline: MetaGeneAnnotator + tRNAscan-SE + rRNA db Prokaryote
2009 DAWGPAWS Estill, J. C. & Bennetzen, J. L. The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods 5, 1–11 (2009). Eukaryote / Plant pipeline for the annotation of genes and transposable elements in plant genomes
2010 MetaGeneMark Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, 1–15 (2010). Ab initio HMM Metagenome 220 Self training
2010 Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm) Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010) Ab initio dynamic programming + HMM. Log-likelihood coding statistics trained from data. Prokaryote, Metagenome Self training
2010 GenePRIMP Pati A. Ivanova N.N. Mikhailova N. Ovchinnikova G. Hooper S.D. Lykidis A. Kyrpides N.C. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes Nat. Methods 2010 7 455457 - - Prokaryote evidence-based evaluation
2010 VIGOR (Viral Genome ORF Reader) Wang S, Sundaram JP, Spiro D. 2010. VIGOR, an annotation program for small viral genomes. BMC Bioinformatics 11:451. http://dx.doi.org /10.1186/1471-2105-11-451. Evidence Virus: influenza virus, rotavirus, rhinovirus and coronavirus subtypes Web application tool
2010 FragGeneScan Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191 ab initio HMM Metagenome HMM-based. Combines sequencing error models with codon usage
2010 MetaGeneMark Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38:e132 abinitio Metagenome Update of GeneMark.hmm with improved model parameters for metagenomic samples
2010 Gnomon Souvorov, A. et al. Gnomon — the NCBI eukaryotic gene prediction tool. National Center for Biotechnology Information, (2010). Abinitio HMM; Translational and splice signals are described using WMM and WAM models Following the Genscan logic Gnomon recognizes as HMM states coding exons and introns on both strands and intergenic sequence
2011 MAKER2 Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011). Pipeline / combiner Evidence or abinitio or abinitio evidence driven Eukaryote / Prokaryote 184 It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap
2011 GenSAS Lee T, Peace C, Jung S, Zheng P, Main D, Cho I (2011) GenSAS: an online integrated genome sequence annotation pipeline. In: 4th International conference on biomedical engineering and informatics (BMEI), Shanghai, 2011, pp. 1967–1973. doi: 10.1109/BMEI.2011.6098712 pipeline An online integrated genome sequence annotation pipeline
2011 VMGAP (TheViral MetaGenome Annotation Pipeline) Lorenzi, H. A. et al. TheViral MetaGenome Annotation Pipeline(VMGAP): an automated tool for the functional annotation of viral metagenomic shotgun sequencing data. Stand. Genomic Sci. 4, 418–429 (2011). Pipeline Viruses
2012 eCRAIG (ensemble CRAIG) Bernal A, Crammer K, Pereira F: Automated gene-model curation using global discriminative learning. Bioinformatics. 2012, 28 (12): 1571-1578. 10.1093/bioinformatics/bts176. combiner CRF-based 4
2012 MOCAT Kultima JR, Sunagawa S, Li J, Chen W, Chen H, Mende DR, Arumugam M, Pan Q, Liu B, Qin J (2012) MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7:e47656 pipeline Use Prodigal or MetaGeneMark Metagenome
2013 GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads) Zickmann F, Lindner MS, Renard BY (2013) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613 abinitio evidence driven maximum-flow approach Eukaryote, Prokaryote Based on the observed mapping coverage, GIIRA identifies candidate genes that are refined in further validating steps.
2013 Eugene-P Next-generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011. E. Sallet et al. DNA Res. 2013 Prokaryote
2013 MetaGUN Liu Y, Guo J, Hu G, Zhu H (2013) Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14:S12 abinitio SVM-based. Phylogenetic binning and assignment of protein sequences to each bin Metagenome
2014 ZUPLS Song, K., Tong, T., and Wu, F. (2014). Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6, 460–469. doi: 10.1039/c3ib40241j ab initio Z-curve Prokaryote
2014 OMIGA (Optimized Maker-Based Insect Genome Annotation) Liu J. Xiao H. Huang S. Li F. OMIGA: Optimized Maker-Based Insect Genome Annotation Mol. Genet. Genomics 2014 289 567 573 pipeline (MAKER) Augustus,Snap,GeneMark Insect
2014 GeneMark-ET Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. 42, 1–8 (2014). Ab initio HMM Eukaryote 10 Self training
2014 Prokka Seemann T., Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063 pipeline Ab initio + evidence-based for functional annotation prokaryote https://github.com/tseemann/prokka Do structural and functional annotation .gff, .gbk, .fna, .faa, .ffn, .sqn, .fsa, .tbl, .err, .log, .txt, .tsv
2014 DFAST (DDBJ Fast Annotation and Submission Tool) Seemann T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics , 30, 2068–2069. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Yasuhiro Tanizawa, Takatomo Fujisawa, Yasukazu Nakamura. Bioinformatics, Volume 34, Issue 6, 15 March 2018, Pages 1037–1039 Pipeline Prokaryote The original version of DFAST employs the lightweight command-line program Prokka as an annotation engine. Now DFAST uses MetaGeneAnnotator (MGA) by default to predict CDSs and GHOSTX as a default aligner. Standalone and web versions. Data ready to submit to DDBJ
2015 Ipred Zickmann, F. & Renard, B. Y. IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy. BMC Genomics 16, 134 (2015). Combiner evidence-based choose the best possible set of exons and combine them in a gene model. Evidence based chooser. Can also model gene form evidence only.
2015 GASS ( Genome Annotation based on Species Similarity) GASS: genome structural annotation for Eukaryotes based on species similarity. Wang Y, Chen L, Song N, Lei X. BMC Genomics. 2015 Mar 4; 16():150. comparative shortest path model and DP
2016 BRAKER1 Lange, S., Hof, K., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1 : Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. 24, 2014 (2014). Hybrid / Pipeline 1 Pipeline: GeneMark-ET + Augustus
2016 Companion Steinbiss, S. et al. Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Res. 44, W29–W34 (2016) Pipeline for automatic eukaryotic parasite annotation
2016 Gmove Dubarry, M. et al. Gmove a tool for eukaryotic gene predictions using various evidences. F1000Reserach 34, 2011 (2016). Eukaryote
2016 AugustusCGP König S, Romoth LW, Gerischer L, Stanke M. Bioinformatics. 2016 Nov 15; 32(22):3388-3395. comparative Eukaryote mutiple genomes
2016 CAT Comparative Analysis toolkit Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. https://doi.org/10.1101/gr.233460.117 (2018). pipeline Evidence based, comparative-abinitio (AugustusCGP) takes as input a HAL-format multiple whole genome alignment. GFF3 + many plots
2016 CESAR Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Sharma V, Elghafari A, Hiller M. Nucleic Acids Res. 2016 Jun 20; 44(11):e103. comparative Uses a HMM to adjust splice sites in whole-genome alignments, improving transcript projections
2016 PGAP Tatusova T. et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res ., 44, 6614–6624. pipeline GenemarkS+ Glimmer + extrinsec data Prokaryote This is the NCBI annotation service incorporated in its submission system, but it is only available for GenBank submitters.
2017  GeMoMa Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants , animals and fungi. (2017). homology-based gene prediction program
2017  funannotate doi.org/10.5281/zenodo.2576527 Pipeline Evidence Modeler + Augustus + GeneMark-ES/ET + evidence + PASA built specifically for fungi, but will also work with higher eukaryotes homology-based gene prediction program
2017 GAWN unpublished - https://github.com/enormandeau/gawn pipeline evidence-based only GMAP to create gene and cufflinks and TransDecoder to add UTR eukaryote
2018 FunGAP Min B, Grigoriev IV, Choi IG. FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation. Bioinformatics (Oxford, England). 2017;33(18):2936–7. pipeline
2018 BRAKER2 Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M., & Borodovsky, M. (2021). BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics, 3(1), 1–11. https://doi.org/10.1093/nargab/lqaa108 Hybrid eukaryote
2018 G-OnRamp Ab-initio Web-based Augustus,GlimmerHMM,SNAP
2018 VIRULIGN VIRULIGN: fast codon-correct alignment and annotation of viral genomes. Pieter J K Libin, Koen Deforche, Ana B Abecasis, Kristof Theys. Bioinformatics, Volume 35, Issue 10, 15 May 2019, Pages 1763–1765, https://doi.org/10.1093/bioinformatics/bty851 Similarity Virus
2019 GAAP Jinhwa Kong, Sun Huh, Jung-Im Won, Jeehee Yoon, Baeksop Kim, and Kiyong Kim. GAAP: A Genome Assembly + Annotation Pipeline. BioMed Research International, Volume 2019, Article ID 4767354, 12 pages pipeline Augustus,EVM,MAKER,PASA Genome Assembly + Annotation Pipeline
2019 VAPiD (Viral Annotation Pipeline and iDentification) Ryan C. Shean, Negar Makhsous, Graham D. Stoddard, Michelle J. Lin & Alexander L. Greninger. VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformaticsvolume 20, Article number: 48 (2019) pipeline Virus
2019 Vgas (Viral Genome Annotation System) Kai-Yue Zhang, Yi-Zhou Gao, Meng-Ze Du, Shuo Liu, Chuan Dong, and Feng-Biao Guo. Vgas: A Viral Genome Annotation System. Front Microbiol. 2019; 10: 184. abinitio + similarity-based ZCURVE_V + BLASTp Virus In their paper they say: When combining Vgas with GeneMarkS and Prodigal, better prediction results could be obtained than with each of the three individual programs.
2020 VADR VADR: validation and annotation of virus sequence submissions to GenBank. Alejandro A. Schäffer, Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi & Eric P. Nawrocki. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3 HMM + similarity Virus
2021 MOSGA Martin, R., Hackl, T., Hattab, G., Fischer, M. G., & Heider, D. (2021). MOSGA: Modular Open-Source Genome Annotator. Bioinformatics, 36(22–23), 5514–5515. https://doi.org/10.1093/bioinformatics/btaa1003 Ab initio, Hybrid, Pipeline Framework Eukaryote Web interface, RNA-Seq/Proteins/Orthology based prediction possible + validation GFF3 + Sequin
2021 TSEBRA Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M., & Stanke, M. (2021). TSEBRA: transcript selector for BRAKER. BMC Bioinformatics, 22(1), 566. https://doi.org/10.1186/s12859-021-04482-0 Hybrid Eukaryote RNA-Seq + Proteins GFF3
year Tool name Publication Type Method Organism Nb citation (pubmed 2016) Comments Output Format

Legend:
Hybrid = ab initio and evidence based = HMM-based gene prediction tool using extrinsic evidence
Comparative = genome sequence comparison

CHMM: class HMM CRF: conditional random field; HMM
DBN: Dynamic Bayes network
DP: dynamic programming
EHMM: evolutionary HMM
GHMM: generalized HMM
GPHMM: generalized pair HMM
HMM: hidden Markov model
IMM: Interpolated Markov model
LDA: Linear Discriminant Analysis
MDD: maximal dependence decomposition
ML: maximum likelihood
MM: Markov Model
NN: Neural Networks
PHMM: pair HMM
phyloHMM: phylogenetic HMM
RBFN: Radial Basis Function Network
SVM: support vector machine
WAM: weight array matrix


Interesting publications

Rogic, S., Mackworth, A. K., & Ouellette, F. B. (2001). Evaluation of gene-finding programs on mammalian sequences. Genome research, 11(5), 817-32.
Goodswen, S. J., Kennedy, P. J., & Ellis, J. T. (2012). Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques. PloS one, 7(11), e50609.
Chowdhury, B., Garai, A., & Garai, G. (2017). An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm. BMC bioinformatics, 18(1), 460. doi:10.1186/s12859-017-1874-7 Joel Armstrong, Ian T. Fiddes, Mark Diekhans and Benedict Paten. Whole-Genome Alignment and Comparative Annotation.Annu Rev Anim Biosci. 2019 Feb 15; 7: 41–64.
Alice Carolyn McHardy Andreas Kloetgen. Finding Genes in Genome Sequence. Bioinformatics pp 271-291
Bączkowski, K., Mackiewicz, K., Kowalczuk, M., Banaszak, J. and Cebrat, S., “Od sekwencji do funkcji– poszukiwanie genów i ich adnotacje,” Biotechnologia 3(70), 22–44 (2005)
Pirovano, W., Boetzer, M., Derks, M. F. L., & Smit, S. (2017). NCBI-compliant genome submissions: Tips and tricks to save time and money. Briefings in Bioinformatics, 18(2), 179–182. https://doi.org/10.1093/bib/bbv104

Interesting books
Principles of Gene Manipulation and Genomics. De Sandy B. Primrose, Richard Twyman