List of genome annotation tools

See here for the list of plastome annotation tools
See here for the list of mitome annotation tools
See here for the list of plasmidome annotation tools
Back to the knowledge page

year	Tool name	Publication	Type	Method	Organism	Nb citation (pubmed 2016)	Comments	Output Format
1991	GRAIL	E. C. Uberbacher and R. J. Mural (1991), "Locat- ing protein-coding regions in human DNAsequences by a multiple sensor-neural network approach", Proc. Natl. Acad. Sci. USA,Vol. 88, pp. 11261- 11265. R. J. Mural, J. R. Einstein, X. Guan, R. C. Mann and E. C. Uberbacher(1992), "All Artificial Intelli- gence Approach to DNASequence Feature Recogni- tion", Trend in Biotechnology, 10, pp. 66 - 69.	Ab initio (sensors + Neural network)				No longer supported
1991	NetGene	Brunak, S., Engelbrecht, J., and Knudsen, S. (1991). Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65.	Ab initio
1992	GeneID	Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157.	Ab initio	WAM, HMM, PD, AD, NN
1992	GeneID+	Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992), Prediction of gene structure J. Mol. Biol. 226, 141–157.	Hybrid	WAM, HMM, PD, AD, NN			use information from protein sequence database searches
1992	SORFIND	Hutchinson, G. B., and Hayden, M. R. (1992) Nucleic Acids Res. 20, 3453–3462.	Abinitio
1993	Genemark	Borodovsky and McIninch	Ab initio
1993	Geneparser	Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 21: 607-613.	Ab initio	DP combined with a neural network program
1994	GRAIL-II	Recognizing exons in genomic sequence using GRAIL II. Xu Y, Mural R, Shah M, Uberbacher E. Genet Eng (N Y). 1994; 16():241-53.	Ab initio
1994	Xpound	Thomas,A. and Skolnick,M.H. (1994) A probabilistic model for detecting coding regions in DNA sequences. IMA J. Math. Appl. Med. Biol., 11, 149–160.	Ab initio
1994	EcoParse		Ab initio	HMM	Prokaryote	393
1994	GeneLang / GenLang	Dong, S. and Searls, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23: 540-551.	Ab initio	Linguistic method HMM, PD, WAM	Eukaryote
1995	Fgeneh (Find gene in human) / GeneFinder	Solovyev VV, Salamov AA, Lawrence CB (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. Proceedings/International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology 3: 367–375	Ab initio	HMM, DP, LDA	Human		Finds single exon only
1995	Geneparser2	Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18.	Ab initio	DP combined with a neural network program
1995	Geneparser3	Snyder EE, Stormo GD J Mol Biol. 1995 Apr 21; 248(1):1-18.	hybrid	DP combined with a neural network program
1996	GeneHacker	Yada.T , Hirosawa.M DNA Res., 3, 335-361 (1996). Syst. Mol. Biol. pp.252-260 (1996). Syst. Mol. Biol. pp.354-357 (1997)..	ab initio	Markov model	Prokaryote
1996	Genie	Kulp, D.; Haussler, D.; Reese, M. G.; and Eeckman, F. H. 1996. A generalized hidden Markov model for the recognition of human genes in DNA. In D.J. States et al., ed., Proc. Conf. on Intelligent Systems in Molecular Biology, 134–142. Menlo Park, CA: AAAI Press.	Hybrid	GHMM + neural networks
1996	Procrustes	Gene recognition via spliced sequence alignment. Gelfand MS, Mironov AA, Pevzner PA. Proc Natl Acad Sci U S A. 1996 Aug 20; 93(17):9061-6.	Evidence based
1997	Fgenes / GeneFinder	Solovyev	Ab initio	HMM, DP, LDA	Human
1997	GenScan	Burge, C. (1997). Identification of genes in human genomic DNA. Ph.D. thesis, Stanford University. ; Burge, C. & Karlin, S. (1997). Prediction of complete gene structures in genomic DNA. Journal of Molecular Biology, 268,78–94	Ab initio	GHMM			GENSCAN++ is a reimplementation of GENSCAN in C++ (~2001)
1997	MZEF	Identification of protein coding regions in the human genome by quadratic discriminant analysis. Zhang MQ. Proc Natl Acad Sci U S A. 1997 Jan 21; 94(2):565-8.		Quadratic discriminant analysis
1997	HMMGene	Krogh A. Two methods for improving performace of a HMM and their application for gene finding. In: Gaasterland T, Karp P, Karplus K, Ouzounis C, Sander C, Valencia A, editors. The fifth international conference on intelligent Systems for Molecular Biology. CA: Menlo Park: AAAI Press; 1997. pp. 179–186.	Ab initio	CHMM	Vertebrate and C. elegans		No download version. Webserver.
1997	GeneWise (from Wise2 distribution)	unplublished. Birney, E. and Durbin, R. 1997. Wise2. http://www.sanger.ac.uk/Software/Wise2.	Evidence based
1997	AAT (Analysis and Annotation Tool)	Huang et al.	Evidence based				Include two paris of programs DPS/NAP and DDS/GAP
1998	Orpheus	Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26:2941–2947	abinintio + evidence	Seed and extend	Prokaryote / Archaea
1998	SIM4	A computer program for aligning a cDNA sequence with a genomic DNA sequence. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W. Genome Res. 1998 Sep; 8(9):967-74.
1998	GIN	Y. Cai and P. Bork, “Homology-based gene prediction using neural nets, Analytical Biochemistry, vol. 265, no. 2, pp. 269–274, 1998.	Hybrid	NN + homology	Vertebrate
1998	GAIA	GAIA: framework annotation of genomic sequence. Bailey LC Jr, Fischer S, Schug J, Crabtree J, Gibson M, Overton GC. Genome Res. 1998 Mar; 8(3):234-50.	homology-based
1998	MORGAN (Multi-frame Optimal Rule-based Gene ANalyzer)	Salzberg S, Delcher AL, Fasman KH, Henderson J. J Comput Biol. 1998 Winter; 5(4):667-80.	Abinitio	DP algorithm in combination with a decision tree program			Hybrid tool combining decision trees with dynamic programming and signal sensor algorithm
1998	GeneMark.hmm	Lukashin, A. V & Borodovsky, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115 (1998).	Ab initio	HMM. Iteratively trains and improves the model in an unsupervised manner	Prokaryote / Archaea	1334	Self training
1998	Glimmer	Salzberg, S., Delcher, A., Kasif, S., and White, O. (1998b). Microbialgene identification using interpolated Markov models.Nucleic Ac-ids Res.26(2), 544 –548.	Abinitio	IMM	Prokartyote + Archaea
1999	Fgenesh	Solovyev and Salamov	HMM		programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode
1999	GlimmerM	Salzberg,S.L., Pertea,M., Delcher,A.L., Gardner,M.J. and Tettelin,H. (1999) Interpolated Markov models for eukaryotic gene finding. Genomics, 59, 24–31.	Abinito	IMM	Small eukaryote		developed to find genes in the malaria parasite Plasmodium falciparum.
1999	Veil (the Viterbi Exon-Intron Locator)	Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman. This describes the VEIL system for finding genes. Journal of Computational Biology 4:2 (1997), 127-141.	HMM		Eukaryote
1999	CRITICA (Coding Region Identification Tool Invoking Comparative Analysis)	Badger and Olsen. Molecular Biology and Evolution, 16(4):512-524. 1999.	Comparative		Prokaryote / Archaea		Comparative analysis is based on amino acid sequence similarity to other species
2000	Fgenesh+	Salamov AA, Solovyev VV Genome Res. 2000 Apr; 10(4):516-22.; Solovyev V.V. (2007) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D., Cannings C., Bishop M.), Wiley-Interscience; 3d edition, 1616 p.	HMM plus similar protein-based gene prediction					Fgenesh+ is a variant of Fgenesh that takes into account some information about similar proteins
2000	Rosetta	Batzoglou et al., 2000	Comparative genomics				Two genomes. Uses pairwise genomic alignments to find regions of homology; incorporates a splice junction and exon length model.
2000	CEM	Bafna & Huson, 2000	Comparative genomics				Two genomes
2001	GenomeScan	Computational inference of homologous gene structures in the human genome. Yeh RF, Lim LP, Burge CB, Genome Res. 2001 May; 11(5):803-16.	Comparative
2001	Eugene		Hybrid	Semi-Markov Conditional Random Fields / IMM, DP	Plant		Can be seen as a combiner because collect information about splice sites and ATG has to be done outside the program.
2001	Twinscan	Ian Korf, Paul Flicek, Daniel Duan, Michael R. Brent. Bioinformatics, Volume 17, Issue suppl_1, June 2001, Pages S140–S148, https://doi.org/10.1093/bioinformatics/17.suppl_1.S140	comparative-genomics-based				Two genomes. Uses local alignments between a target genome and a reference (informant) genome to identify regions of conservation
2001	GeneHacker Plus	Yada,T., Totoki,Y., Takagi,T. and Nakai,K. ( 2001 ) A novel bacterial gene‐finding system with improved accuracy in locating start codons. DNA Res. , 8 , 97 –106	Ab initio	HMM	Prokaryote	50
2001	GeneMarkS	GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Besemer J, Lomsadze A, Borodovsky M. Nucleic Acids Res. 2001 Jun 15; 29(12):2607-18.	Ab initio	HMM	Prokaryote	742	Self training
2001	SGP-1 (Syntenic Gene Prediction)	SGP-1: prediction and validation of homologous genes based on sequence alignments. Wiehe T, Gebauer-Jung S, Mitchell-Olds T, Guigó R. Genome Res. 2001 Sep; 11(9):1574-83.	Comparative		vertebrates and plants		Dual genomes. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci.
2001	Spidey	Spidey: a tool for mRNA-to-genomic alignments. Wheelan SJ, Church DM, Ostell JM. Genome Res. 2001 Nov; 11(11):1952-7.
2002	DOUBLESCAN	Meyer IMM, Durbin R. 2002. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–18	comparative	PHMM			Uses a pair HMM to simultaneously predict gene structures and conservation in two aligned sequences
2002	AGenDA (Alignment-based Gene-Detection Algorithm)	Oliver Rinner and Burkhard Morgenstern. AGenDA: Gene Prediction by Com- parative Sequence Analysis. Silica Biology, 2:4673-4680, 2002.	comparative		Eukaryote		Based on pair-wise alignments created by CHAOS and DIALIGN
2002	GAZE	Howe, K. L. et al. GAZE : A Generic Framework for the Integration of Gene-Prediction Data by Dynamic Programming. 1418–1427 (2002). doi:10.1101/gr.149502	Comparative / combiner
2002	BDGF	Shibuya T, Rigoutsos I (2002) Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 30:2710–2725	Evidence based		Prokaryote / Archaea		Classifications based on universal CDS-specific usage of short amino acid “seqlets”
2003	EvoGene	Pedersen JS, Hein J. 2003. Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–27	Comparative / evolutionary	Evolutionary Hidden Markov Model (EHMM)			Phylogenetic HMM that performs ab initio prediction of genes across a multiple-sequence alignment (more than two genomes), making use of phylogenetic information
2003	GeneMarkS (virus version)	Mills R, Rozanov M, Lomsadze A, Tatusova T, Borodovsky M. Improving gene annotation of complete viral genomes. Nucleic Acids Res. 2003;31(23):7041–7055. doi: 10.1093/nar/gkg878.	Ab initio	HMM	Virus	742	Self training
2003	GeneComber	Shah SP, McVicker GP, Mackworth AK, Rogic S, Ouellette BF. 2003. GeneComber: combining outputs of gene prediction programs for improved results. Bioinformatics 19:1296–97	Combiner	EUI, GI and EUI frame algorithms			It runs Genscan and HMMgene and combines results
2003	AUGUSTUS	Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2, ii215–ii225 (2003).	abinitio	HMM	Eukaryote
2003	SLAM	M. Alexandersson, S. Cawley, and L. Pachter. 2003. SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res., 13:496-502.	Comparative	GPHMM (Generalized pair HMM)	Eukaryote	187	Dual genome. Treats two alignments in a symmetric way, predicting pairs of transcripts
2003	SGP2	G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett, and R. Guigo. 2003. Comparative gene prediction in human and mouse. Genome Res., 13:108-117	comparative		Eukaryote		Dual genome. It integrates the sequence similarity search program TBLASTX (WU-BLAST) and the ab initio gene finder GeneiD. Used by the Mouse Genome Sequencing Consortium in 2002 to annotate the mouse genome. Uses pairwise genomic alignments to find syntenic loci; evaluates a coding and splice model in these loci.
2003	PASA (Program to Assemble Spliced Alignments)	Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr., et al. 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res	pipeline - combiner - evidence-based				Uses alignments of cDNA, EST, or RNA-seq to predict gene structures, including alternative splice events. Can run GMAP and BLAT to do alignment. Can use external gff3 file
2003	EasyGene	EasyGene--a prokaryotic gene finder that ranks ORFs by statistical significance. Larsen TS, Krogh A. BMC Bioinformatics. 2003 Jun 3; 4():21.	Ab initio	HMM, H	Prokaryote / Archaea	153
2003	AMIGene (Annotation of MIcrobial Genes)	Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G., Medigue, C. 2003 AMIGene: Annotation of MIcrobial Genes Nucleic Acids Res. 31 3723 –3726	Ab initio	HMM	Prokaryote
2003	ETOPE	Anton Nekrutenko, Wen-Yu Chung, Wen-Hsiung Li. Nucleic Acids Research, Volume 31, Issue 13, 1 July 2003, Pages 3564–3567, https://doi.org/10.1093/nar/gkg597	Comparative / evolutionary	based on the ratio of non-synonymous to synonymous substitution rates between sequences from different genomes	Eukaryote	20	Based on Genscan output. It doesn't predict exons but rather validate exon predicted by other tools.
2003	CRASA	A complexity reduction algorithm for analysis and annotation of large genomic sequences. Chuang TJ, Lin WC, Lee HC, Wang CW, Hsiao KL, Wang ZH, Shieh D, Lin SC, Ch'ang LY. Genome Res. 2003 Feb; 13(2):313-22.
2003	YACOP	Tech M, Merkl R (2003) YACOP: enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3:441–451	combiner: abinitio + evidence	Utilizes Glimmer, Critica and ZCURVE	Prokaryote / Archaea
2003	ZCurve	Guo FB, Ou HY, Zhang CT (2003) ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res 31:1780–1789	abinitio	Z curve. correlation of dinucleotides.	Prokaryote / Archaea		Uses the “Z-transform” of DNA as the information source for classification
2003	Eugene'Hom	Foissac S, Bardou P, Moisan A, Cros M, Schiex T. EuGene'Hom: a generic similarity-based gene finder using multiple homologous sequences. Nucleic Acids Res 2003; 31: 3742-3745.	Evidence-based		Eukaryote
2004	GeneWise	GeneWise and Genomewise. Birney E, Clamp M, Durbin R. Genome Res. 2004 May; 14(5):988-95.	Hybrid				HMM-based gene prediction tool using extrinsic evidence
2004	Ensembl		Pipeline Evidence based				Pipeline
2004	RescueNet	Mahony S, McInerney JO, Smith TJ, Golden A (2004) Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models. BMC Bioinformatics 5:23	Ab initio, evidence		Prokaryote, Archaea		Unsupervised discovery of multiple gene classes using a self-organizing map. No exact start/stop prediction
2004	Reganor	McHardy AC, Goesmann A, Puhler A, Meyer F (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631	combiner: abintito + evidence	Uses Glimmer and Critica	Prokaryote / Archaea
2004	Combiner	Allen, J.E., et al. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14142–148	combiner	Linear Combiner that uses a voting function; statistical scoring method that uses decision trees			Three different algorithms for combining evidence in the Combiner were implemented
2004	GlimmerHMM	Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879.	Ab initio	GHMM	eukaryote
2004	GeneZilla (formerly "TIGRscan")	Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879.	Ab initio	GHMM	eukaryote		No longer supported
2004	SNAP (Semi-HMM-based Nucleic Acid Parser)	Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004)	Ab initio	semi-HMM
2004	Projector	Meyer IM, Durbin R. 2004. Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32:776–83	comparative	PHMM			Similar to DOUBLESCAN but extends the model to make use of annotation information on one sequence to inform the other
2004	ExoniPhy	Siepel A, Haussler D. 2004. Computational identification of evolutionarily conserved exons. In Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, ed.Gusfield D, Bourne P, Istrail S, Pevzner P, Waterman M, pp. 177–86. New York: Assoc. Comput. Mach.	Comparative / evolutionary	phylo-HMM			Phylogenetic HMM that performs ab initio predictions across a multiple-sequence alignment
2005	ExonHunter	Bronislava Drejova. Evidence Combination in Hidden Markov Models for Gene Prediction. PhD thesis, the University of Waterloo, 2005. Broii.a Brejova, Daniel G. Brown, Ming Li, and Tomas Vinaf. ExonHunter: a comprehensive approach to gene finding. Bioinformatics, 21 Suppl. 1:i57- i65, 2005.	Comparative + evidence driven	GHMM			use genomic sequences, expressed sequence tags and protein databases of related species
2005	JIGSAW	Jonathan E. Allen and Steven L. Salzberg. JIGSAW: Integration of Multiple Sources of Evidence for Gene Prediction. Bioinformatics, 21:3596- 3603, 2005.	Combiner	GHMM-like algorithm		137	select the prediction whose structure best represents the consensus
2005	AIR	Florea L, Di Francesco V, Miller J, Turner R, Yao A, et al. 2005. Gene and alternative splicing annotation with AIR. Genome Res.	evidence				Integrates multiple forms of extrinsic evidence to perform alternative splice junction prediction
2005	GeneMark-ES	Lomsadze, A. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005) ; Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18, 1979–1990 (2008).	Ab initio		Eukaryote	243 / 200
2005	BGF (Beijing Gene Finder)	Li, H. et al. Test data sets and evaluation of gene prediction programs on the rice genome. J Comp Sci Tech 20, 446–453 (2005).	Ab initio	semi HMM	Plant (Eukaryote in general?)
2005	TWAIN	Majoros WH, Pertea M, Salzberg SL. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics. 2005;21(9):1782–1788.	comparative	GPHMM			Dual genome
2005	GenomeThreader	G. Gremme, V. Brendel, M.E. Sparks, and S. Kurtz. Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology, 47(15):965-978, 2005	Evidence based	Similarity	All		The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments
2006	MaGe	Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006, 34 (1): 53-65. 10.1093/nar/gkj406.	Pipeline		Bacteria		AMIGene for protein coding, RBSfinder for ribosome,tRNAscan-SE for tRNA, Rfam for small RNAs and riboswitches,etc.	sort of gff3 (not fully compatible. Define only gene and CDS feature. Gene do not have ID and CDS do not have parent attributes but share locus_tag attribute)
2006	DOGFISH (for ‘detection of genomic features in sequence homologies’)	Carter D, Durbin R. 2006. Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol. 7(Suppl. 1):S6.1–12	Comparative	HMM	vertebrate		Two-step program that combines a classifier that scores potential splice sites using a multiple-sequence alignment and an ab initio gene predictor that makes use of the scores from the classifier to predict gene structures. More than two genomes possible.
2006	AUGUSTUS+	Stanke M, Schoffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:62.	Hybrid	GHMM or CRF
2006	N-SCAN (a.k.a. TWINSCAN 3.0)	Annual International Conference on Research in Computational Molecular Biology RECOMB 2005: Research in Computational Molecular Biology pp 374-388 ; Gross SS, Brent MR. 2006. Using multiple alignments to improve gene prediction. J. Comput. Biol. 13:379–93.	Comparative				Can use more than 2 genomes (Extends the TWINSCAN model to N genomes)
2006	ZCURVE_V	ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes.	ab initio	Z curve	Virus		self-training
2006	TWINSCAN_EST	Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327.	Comparative + Evidence				Two genome
2006	N_Scan_EST	Wei C, Brent MR: Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics. 2006, 7: 327-10.1186/1471-2105-7-327.	Comparative + Evidence	HMM			HMM-based gene prediction tool that makes use of EST and genomic alignments, incorporating phylogenetic information
2006	Metagene	Noguchi, H., Park, J. & Takagi, T. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 34, 5623–5630 (2006)	ab initio		Metagenomic	294
2006	TiCo	An unsupervised classification scheme for improving predictions of prokaryotic TIS. Tech M, Meinicke P. BMC Bioinformatics. 2006 Mar 9; 7():121.			Prokaryote		clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation
2006	FGENESH++	Solovyev V, Kosarev P, Seledsov I, Vorobyev D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 2006;7 Suppl 1:S10.1-12.	Pipeline	hybrid: hmm+extrinsinc			automated version of FGENESH+
2007	Conrad	DeCaprio, D. et al. Conrad: gene prediction using conditional random fields. Genome Res. 17, 1389–1398 (2007).	comparative	semi-Markov conditional random fields (SMCRFs)			first comparative gene predictor based on SMCRFs. Can use more than 2 genomes
2007	Contrast	Gross SS, Do CB, Sirota M, Batzoglou S. 2007. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol. 8:R269.	Comparative	CRF,SVM. Combines local classifiers with the global gene structure model.		90	Can also incorporate information from EST alignment. Can use more than 2 genomes. Uses a combination of SVM and CRF predictors, providing a big boost over traditional HMMs
2007	GISMO	Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007) GISMO—gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35:540–549	Abinitio + evidence	SVMs	Prokaryote		Uses SVMs. Model training is based on “reliable” genes found with PFAM protein domain HMMs.	GFF2
2007	Genomix	Coghlan, A. & Durbin, R. Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron–exon structure. Bioinformatics 23, 1468–1475 (2007).	combiner	DP	eukaryote		use dynamic programming to select the best conserved (top-scoring) predicted exons in the query region, and combine them into a gene structure
2007	GLEAN	Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).	combiner	HMM	Eukaryote		use an unsupervised learning method
2007	FLAN	Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Tatusova T. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res. 2007. pp. W280–284.	similarity-based		Influenza virus
2007	transMap	Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Stanke M, Diekhans M, Baertsch R, Haussler D. Bioinformatics. 2008 Mar 1; 24(5):637-44. Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. 2007. Comparative genomics search for losses of long-established genes on the human lineage. PLOS Comput. Biol. 3:e247.	Evidence		Eukaryote		Uses whole-genome alignments to project existing annotations from one genome to one or more other genomes. first developed in conjunction with improvements to AUGUSTUS to model extrinsic information
2007	GLIMMER3	Delcher AL, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics , 2007, vol. 23 (pg. 673-679)	Abinitio	IMM	bacteria, archæa and viruses		It integrates Ribosome binding sites evidence directly into the gene-finding algorithm. It distinguishs host and endosymbiont DNA.
2008	SCGPred	SCGPred: a score-based method for gene structure prediction by combining multiple sources of evidence. Li X, Ren Q, Weng Y, Cai H, Zhu Y, Zhang Y Genomics Proteomics Bioinformatics. 2008 Dec; 6(3-4):175-85.	Combiner		Eukaryote		automated eukaryotic gene structure annonator that computes weighted consensus gene structure based on multiple sources of available evidence
2008	RAST	Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. The RAST Server: rapid annotations using subsystems technology, BMC Genomics , 2008, vol. 9 pg. 75	pipeline		bacterial and archaeal		Online service that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
2008	Maker	Cantarel, B. L. et al. Maker. Genome Res. 18, 188–96 (2008).	Combiner			306	It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap
2008	Evigan	Liu, Q., Mackey, A. J., Roos, D. S. & Pereira, F. C. N. Evigan: A hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics 24, 597–605 (2008).	Combiner	Dynamic Bayes networks (DBNs)	Eukaryote	52	Choose the best possible set of exons and combine them in a gene model. Weight of different sources. Unsupervised learning method
2008		Y. Zhou, Y. Liang, C. Hu, L. Wang, X. Shi, An artificial neural network method for combining gene prediction based on equitable weights, NeuroComputing 71 (2007) 538–543	combiner	RBFN
2008	Evidence Modeler (EVM)	Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).	Combiner				choose the best possible set of exons and combine them in a gene model weight of different sources. Evidence based chooser.
2008	Chemgenome2.0	Poonam Singhal, B. Jayaram, Surjit B. Dixit and David L. Beveridge. Prokaryotic Gene Finding based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations.Biophysical Journal,2008,Volume:94 Issue:11, 4173-4183 ]	Ab initio		Procaryote		Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations
2008	MetaGeneAnnotator (MGA)	Noguchi H, Taniguchi T, Itoh T (2008) Meta- GeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387–396.	abinitio		Prokaryote		MGA is a self-training gene prediction tool for all kinds of prokaryotic genes including atypical genes such as horizontally transferred and prophage-encoded genes
2009	mGene	Schweikert, G. et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19, 2133–43 (2009).	Ab initio	Structural HMM combined with discrimination training techniques similar to SVMs		66	No longer supported
2009	Orphelia		Ab initio	Neural network	Metagenomic	78
2009	MiGAP (Microbial Genome Annotation Pipeline )	Sugawara H. et al. (2009) Microbial genome annotation pipeline (MiGAP) for diverse users. In: Proceedings of the 20th International Conference on Genome Informatics, Yokohama, Japan, S–001–1–2.	Pipeline: MetaGeneAnnotator + tRNAscan-SE + rRNA db		Prokaryote
2009	DAWGPAWS	Estill, J. C. & Bennetzen, J. L. The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods 5, 1–11 (2009).			Eukaryote / Plant		pipeline for the annotation of genes and transposable elements in plant genomes
2010	MetaGeneMark	Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, 1–15 (2010).	Ab initio	HMM	Metagenome	220	Self training
2010	Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm)	Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)	Ab initio	dynamic programming + HMM. Log-likelihood coding statistics trained from data.	Prokaryote, Metagenome		Self training
2010	GenePRIMP	Pati A. Ivanova N.N. Mikhailova N. Ovchinnikova G. Hooper S.D. Lykidis A. Kyrpides N.C. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes Nat. Methods 2010 7 455457	-	-	Prokaryote		evidence-based evaluation
2010	VIGOR (Viral Genome ORF Reader)	Wang S, Sundaram JP, Spiro D. 2010. VIGOR, an annotation program for small viral genomes. BMC Bioinformatics 11:451. http://dx.doi.org /10.1186/1471-2105-11-451.	Evidence		Virus: influenza virus, rotavirus, rhinovirus and coronavirus subtypes		Web application tool
2010	FragGeneScan	Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191	ab initio	HMM	Metagenome		HMM-based. Combines sequencing error models with codon usage
2010	MetaGeneMark	Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38:e132	abinitio		Metagenome		Update of GeneMark.hmm with improved model parameters for metagenomic samples
2010	Gnomon	Souvorov, A. et al. Gnomon — the NCBI eukaryotic gene prediction tool. National Center for Biotechnology Information, (2010).	Abinitio	HMM; Translational and splice signals are described using WMM and WAM models			Following the Genscan logic Gnomon recognizes as HMM states coding exons and introns on both strands and intergenic sequence
2011	MAKER2	Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).	Pipeline / combiner	Evidence or abinitio or abinitio evidence driven	Eukaryote / Prokaryote	184	It uses proteins, transcripts ... Abinitio: Augustus, Fgnesh,Genemark,snap
2011	GenSAS	Lee T, Peace C, Jung S, Zheng P, Main D, Cho I (2011) GenSAS: an online integrated genome sequence annotation pipeline. In: 4th International conference on biomedical engineering and informatics (BMEI), Shanghai, 2011, pp. 1967–1973. doi: 10.1109/BMEI.2011.6098712	pipeline				An online integrated genome sequence annotation pipeline
2011	VMGAP (TheViral MetaGenome Annotation Pipeline)	Lorenzi, H. A. et al. TheViral MetaGenome Annotation Pipeline(VMGAP): an automated tool for the functional annotation of viral metagenomic shotgun sequencing data. Stand. Genomic Sci. 4, 418–429 (2011).	Pipeline				Viruses
2012	eCRAIG (ensemble CRAIG)	Bernal A, Crammer K, Pereira F: Automated gene-model curation using global discriminative learning. Bioinformatics. 2012, 28 (12): 1571-1578. 10.1093/bioinformatics/bts176.	combiner	CRF-based		4
2012	MOCAT	Kultima JR, Sunagawa S, Li J, Chen W, Chen H, Mende DR, Arumugam M, Pan Q, Liu B, Qin J (2012) MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7:e47656	pipeline	Use Prodigal or MetaGeneMark	Metagenome
2013	GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads)	Zickmann F, Lindner MS, Renard BY (2013) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613	abinitio evidence driven	maximum-flow approach	Eukaryote, Prokaryote		Based on the observed mapping coverage, GIIRA identifies candidate genes that are refined in further validating steps.
2013	Eugene-P	Next-generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011. E. Sallet et al. DNA Res. 2013			Prokaryote
2013	MetaGUN	Liu Y, Guo J, Hu G, Zhu H (2013) Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14:S12	abinitio	SVM-based. Phylogenetic binning and assignment of protein sequences to each bin	Metagenome
2014	ZUPLS	Song, K., Tong, T., and Wu, F. (2014). Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6, 460–469. doi: 10.1039/c3ib40241j	ab initio	Z-curve	Prokaryote
2014	OMIGA (Optimized Maker-Based Insect Genome Annotation)	Liu J. Xiao H. Huang S. Li F. OMIGA: Optimized Maker-Based Insect Genome Annotation Mol. Genet. Genomics 2014 289 567 573	pipeline (MAKER)	Augustus,Snap,GeneMark	Insect
2014	GeneMark-ET	Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. 42, 1–8 (2014).	Ab initio	HMM	Eukaryote	10	Self training
2014	Prokka	Seemann T., Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063	pipeline	Ab initio + evidence-based for functional annotation	prokaryote		https://github.com/tseemann/prokka Do structural and functional annotation	.gff, .gbk, .fna, .faa, .ffn, .sqn, .fsa, .tbl, .err, .log, .txt, .tsv
2014	DFAST (DDBJ Fast Annotation and Submission Tool)	Seemann T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics , 30, 2068–2069. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Yasuhiro Tanizawa, Takatomo Fujisawa, Yasukazu Nakamura. Bioinformatics, Volume 34, Issue 6, 15 March 2018, Pages 1037–1039	Pipeline		Prokaryote		The original version of DFAST employs the lightweight command-line program Prokka as an annotation engine. Now DFAST uses MetaGeneAnnotator (MGA) by default to predict CDSs and GHOSTX as a default aligner. Standalone and web versions.	Data ready to submit to DDBJ
2015	Ipred	Zickmann, F. & Renard, B. Y. IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy. BMC Genomics 16, 134 (2015).	Combiner evidence-based				choose the best possible set of exons and combine them in a gene model. Evidence based chooser. Can also model gene form evidence only.
2015	GASS ( Genome Annotation based on Species Similarity)	GASS: genome structural annotation for Eukaryotes based on species similarity. Wang Y, Chen L, Song N, Lei X. BMC Genomics. 2015 Mar 4; 16():150.	comparative	shortest path model and DP
2016	BRAKER1	Lange, S., Hof, K., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1 : Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. 24, 2014 (2014).	Hybrid / Pipeline			1	Pipeline: GeneMark-ET + Augustus
2016	Companion	Steinbiss, S. et al. Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Res. 44, W29–W34 (2016)					Pipeline for automatic eukaryotic parasite annotation
2016	Gmove	Dubarry, M. et al. Gmove a tool for eukaryotic gene predictions using various evidences. F1000Reserach 34, 2011 (2016).			Eukaryote
2016	AugustusCGP	König S, Romoth LW, Gerischer L, Stanke M. Bioinformatics. 2016 Nov 15; 32(22):3388-3395.	comparative		Eukaryote		mutiple genomes
2016	CAT Comparative Analysis toolkit	Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. https://doi.org/10.1101/gr.233460.117 (2018).	pipeline	Evidence based, comparative-abinitio (AugustusCGP)			takes as input a HAL-format multiple whole genome alignment.	GFF3 + many plots
2016	CESAR	Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Sharma V, Elghafari A, Hiller M. Nucleic Acids Res. 2016 Jun 20; 44(11):e103.	comparative				Uses a HMM to adjust splice sites in whole-genome alignments, improving transcript projections
2016	PGAP	Tatusova T. et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res ., 44, 6614–6624.	pipeline	GenemarkS+ Glimmer + extrinsec data	Prokaryote		This is the NCBI annotation service incorporated in its submission system, but it is only available for GenBank submitters.
2017	GeMoMa	Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants , animals and fungi. (2017).					homology-based gene prediction program
2017	funannotate	doi.org/10.5281/zenodo.2576527	Pipeline	Evidence Modeler + Augustus + GeneMark-ES/ET + evidence + PASA	built specifically for fungi, but will also work with higher eukaryotes		homology-based gene prediction program
2017	GAWN	unpublished - https://github.com/enormandeau/gawn	pipeline evidence-based only	GMAP to create gene and cufflinks and TransDecoder to add UTR	eukaryote
2018	FunGAP	Min B, Grigoriev IV, Choi IG. FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation. Bioinformatics (Oxford, England). 2017;33(18):2936–7.	pipeline
2018	BRAKER2	Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M., & Borodovsky, M. (2021). BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics, 3(1), 1–11. https://doi.org/10.1093/nargab/lqaa108		Hybrid	eukaryote
2018	G-OnRamp		Ab-initio Web-based	Augustus,GlimmerHMM,SNAP
2018	VIRULIGN	VIRULIGN: fast codon-correct alignment and annotation of viral genomes. Pieter J K Libin, Koen Deforche, Ana B Abecasis, Kristof Theys. Bioinformatics, Volume 35, Issue 10, 15 May 2019, Pages 1763–1765, https://doi.org/10.1093/bioinformatics/bty851	Similarity		Virus
2019	GAAP	Jinhwa Kong, Sun Huh, Jung-Im Won, Jeehee Yoon, Baeksop Kim, and Kiyong Kim. GAAP: A Genome Assembly + Annotation Pipeline. BioMed Research International, Volume 2019, Article ID 4767354, 12 pages	pipeline	Augustus,EVM,MAKER,PASA			Genome Assembly + Annotation Pipeline
2019	VAPiD (Viral Annotation Pipeline and iDentification)	Ryan C. Shean, Negar Makhsous, Graham D. Stoddard, Michelle J. Lin & Alexander L. Greninger. VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformaticsvolume 20, Article number: 48 (2019)	pipeline		Virus
2019	Vgas (Viral Genome Annotation System)	Kai-Yue Zhang, Yi-Zhou Gao, Meng-Ze Du, Shuo Liu, Chuan Dong, and Feng-Biao Guo. Vgas: A Viral Genome Annotation System. Front Microbiol. 2019; 10: 184.	abinitio + similarity-based	ZCURVE_V + BLASTp	Virus		In their paper they say: When combining Vgas with GeneMarkS and Prodigal, better prediction results could be obtained than with each of the three individual programs.
2020	VADR	VADR: validation and annotation of virus sequence submissions to GenBank. Alejandro A. Schäffer, Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi & Eric P. Nawrocki. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3	HMM + similarity		Virus
2021	MOSGA	Martin, R., Hackl, T., Hattab, G., Fischer, M. G., & Heider, D. (2021). MOSGA: Modular Open-Source Genome Annotator. Bioinformatics, 36(22–23), 5514–5515. https://doi.org/10.1093/bioinformatics/btaa1003	Ab initio, Hybrid,	Pipeline Framework	Eukaryote		Web interface, RNA-Seq/Proteins/Orthology based prediction possible + validation	GFF3 + Sequin
2021	TSEBRA	Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M., & Stanke, M. (2021). TSEBRA: transcript selector for BRAKER. BMC Bioinformatics, 22(1), 566. https://doi.org/10.1186/s12859-021-04482-0		Hybrid	Eukaryote		RNA-Seq + Proteins	GFF3
year	Tool name	Publication	Type	Method	Organism	Nb citation (pubmed 2016)	Comments	Output Format

Legend:
Hybrid = ab initio and evidence based = HMM-based gene prediction tool using extrinsic evidence
Comparative = genome sequence comparison

CHMM: class HMM CRF: conditional random field; HMM
DBN: Dynamic Bayes network
DP: dynamic programming
EHMM: evolutionary HMM
GHMM: generalized HMM
GPHMM: generalized pair HMM
HMM: hidden Markov model
IMM: Interpolated Markov model
LDA: Linear Discriminant Analysis
MDD: maximal dependence decomposition
ML: maximum likelihood
MM: Markov Model
NN: Neural Networks
PHMM: pair HMM
phyloHMM: phylogenetic HMM
RBFN: Radial Basis Function Network
SVM: support vector machine
WAM: weight array matrix

Interesting publications

Rogic, S., Mackworth, A. K., & Ouellette, F. B. (2001). Evaluation of gene-finding programs on mammalian sequences. Genome research, 11(5), 817-32.
Goodswen, S. J., Kennedy, P. J., & Ellis, J. T. (2012). Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques. PloS one, 7(11), e50609.
Chowdhury, B., Garai, A., & Garai, G. (2017). An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm. BMC bioinformatics, 18(1), 460. doi:10.1186/s12859-017-1874-7 Joel Armstrong, Ian T. Fiddes, Mark Diekhans and Benedict Paten. Whole-Genome Alignment and Comparative Annotation.Annu Rev Anim Biosci. 2019 Feb 15; 7: 41–64.
Alice Carolyn McHardy Andreas Kloetgen. Finding Genes in Genome Sequence. Bioinformatics pp 271-291
Bączkowski, K., Mackiewicz, K., Kowalczuk, M., Banaszak, J. and Cebrat, S., “Od sekwencji do funkcji– poszukiwanie genów i ich adnotacje,” Biotechnologia 3(70), 22–44 (2005)
Pirovano, W., Boetzer, M., Derks, M. F. L., & Smit, S. (2017). NCBI-compliant genome submissions: Tips and tricks to save time and money. Briefings in Bioinformatics, 18(2), 179–182. https://doi.org/10.1093/bib/bbv104

Interesting books
Principles of Gene Manipulation and Genomics. De Sandy B. Primrose, Richard Twyman

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotation_tools_genome.md

annotation_tools_genome.md

List of genome annotation tools

Files

annotation_tools_genome.md

Latest commit

History

annotation_tools_genome.md

File metadata and controls

List of genome annotation tools