Cantù Lab @ UC Davis - Annotation Pipeline - EVM based

This Git repository contains the whole pipeline used to generate the structural and functional gene annotation of a grape genome. The pipeline for the assembly of the genome of interest is not included here but is a prerequisite.

Overview

Pipeline

Requirements

Tools

The following tools are required. Some options and compatibilities might depend on the software version. We successfully ran the pipeline using the versions described below.

Folder structure

Most of the tools are assumed to be installed in the PATH. If not, the absolute PATH to the Tools directory is given.

2-Annotation
├── 2_0-External_evidences
│   ├── 2_0_1-Proteins
│   └── 2_0_2-mRNAs
│       ├── 2_0_2_1-External_databases
│       ├── 2_0_2_2-RNAseq
│       │   ├── 2_0_2_2_1-RNAseq_reads
│       │   └── 2_0_2_2_2-RNAseq_assembly
│       └── 2_0_2_3-IsoSeq
│           ├── 2_0_2_3_1-IsoSeq_reads
│           └── 2_0_2_3_2-IsoSeq_polishing
├── 2_1-Training
│   ├── 2_1_1-Training_set
│   │   └── pasa_run.log.dir
│   └── 2_1_2-Predictor_training
│       ├── 2_1_2_1-Augustus
│       ├── 2_1_2_2-Genemark
│       └── 2_1_2_3-SNAP
└── 2_2-Prediction
    ├── 2_2_1-Repeats
    ├── 2_2_2-
    ├── 2_2_3-Prediction
    │   ├── 2_2_3_1-BUSCO
    │   ├── 2_2_3_2-Augustus
    │   ├── 2_2_3_3-Genemark
    │   ├── 2_2_3_4-SNAP
    │   └── 2_2_3_5-PASA
    ├── 2_2_4-Transcript_mapping
    ├── 2_2_5-Protein_mapping
    ├── 2_2_6-EVM
    ├── 2_2_7-Annotation_polishing
    ├── 2_2_8-Filtering
    └── 2_2_9-Functional_annotation

References

Andrews S (2014) FastQC: A Quality Control tool for High Throughput Sequence Data.
Au KF, Underwood JG, Lee L, Wong WH (2012) Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS ONE 7: e46679
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120
Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden TL (2019) Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinformatics 20: 405
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics. doi: 10.1186/1471-2105-10-421
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Sanchez Alvarado A, Yandell M (2007) MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research 18: 188–196
Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research 36: 3420–3435
Haas BJ (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31: 5654–5666
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8: 1494–512
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9: R7
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236–1240
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12: 357–60
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics. doi: 10.1186/1471-2105-5-59
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079
Lomsadze A (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research 33: 6494–6506
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33: 290–295
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842
Seppey M, Manni M, Zdobnov EM (2019) BUSCO: Assessing Genome Assembly and Annotation Completeness. In M Kollmar, ed, Gene Prediction: Methods and Protocols. Springer New York, New York, NY, pp 227–245
Slater G, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. doi: 10.1186/1471-2105-6-31
Smit, AFA, Hubley, R, Green, P (2013) RepeatMasker Open-4.0.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34: W435–W439
Tange O (2011) GNU Parallel: The Command-Line Power Tool. ;login: The USENIX Magazine 36: 42–47
W. James Kent (2002) BLAT : The Blast-Like Alignment Tool. Genome Res 12: 656–664
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21: 1859–1875

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
Pipeline		Pipeline
Scripts		Scripts
media		media
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantù Lab @ UC Davis - Annotation Pipeline - EVM based

Overview

Pipeline

Requirements

Tools

Folder structure

References

About

Releases

Packages

Contributors 2

Languages

andreaminio/AnnotationPipeline-EVM_based-DClab

Folders and files

Latest commit

History

Repository files navigation

Cantù Lab @ UC Davis - Annotation Pipeline - EVM based

Overview

Pipeline

Requirements

Tools

Folder structure

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages