GitHub - DessimozLab/nf-oma-browser-build: Nextflow pipeline to convert OMA to OMA browser

Introduction

dessimozlab/nf-oma-browser-build is a pipeline for building an OMA Browser instance from an OMA (Orthologous MAtrix) analysis. The pipeline converts the output of either a production OMA run or a FastOMA run into the HDF5 files needed to run a omabrowser webserver. The pipeline integrates a lot of additional data, i.e. GO annotations, domain annotations and cross-references to uniprot and refseq.

Pipeline summary

First part of the pipeline is dependend on input, i.e. production / FastOMA. The later steps are common to both input types.

From production OMA pipeline:

extract genomes in dataset from Matrix file
extract from genome dbs relevant data such as proteins, locus, etc
convert Matrix, extract splicing information

From FastOMA:

TODO: implement and write...

Common part

convert HOGs, sequences into HDF5 database, build suffix index and kmer-lookup table (in subworkflow IMPORT_HDF5)
import domain annotations if available
import cross-references from UniProt and RefSeq (subworkflow GENERATE_XREFS)
import GO annotations and Ontology

The pipeline produces in the end in the outputDir (default results/) the necessary files to be loaded into a docker-compose managed omabrowser instance.

Parameters

All parameters are described by running --help of the workflow

nextflow run . --help

Parameter	Description	Type	Default
`hog_orthoxml`	Hierarchcial orthologous groups (HOGs) in orthoxml format	`string`
`matrix_file`	OMA Groups file	`string`
`pairwise_orthologs_folder`	Pairwise Orthologs (only by Standard OMA pipeline)	`string`
`genomes_dir`	Folder containing genomes	`string`
`known_domains`	Folder containing known domain assignments files	`string`
`cath_names_path`	File containing CATH domain descriptions	`string`	http://download.cathdb.info/cath/releases/latest-release/cath-classification-data/cath-names.txt
`pfam_names_path`	File containing Pfam descriptions	`string`	https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz
`xref_uniprot_swissprot`	UniProtKB/SwissProt annotation in text format	`string`	https://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz
`xref_uniprot_trembl`	UniProtKB/TrEMBL annotations in text format	`string`	/dev/null
`xref_refseq`	Folder containing RefSeq gbff files.	`string`
`go_obo`	Gene Ontology OBO file	`string`	http://purl.obolibrary.org/obo/go/go-basic.obo
`go_gaf`	Gene Ontology annotations (GAF format)	`string`	https://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz

Corresponding files from OMA Production Run

Parameter	Path / Value from OMA analysis
`hog_orthoxml`	HOG in orthoxml format from `$DARWIN_OMADATA_PATH/HOGs/`, usually latest one
`matrix_file`	OMA Groups Matrix file, same run as HOG, located in `$DARWIN_OMADATA_PATH/Matrix`; use the `_merged` version
`genomes_dir`	corresponds to `$DARWIN_GENOMES_PATH`. Must contain Summaries.drw, and all the databases in the subfolders`
`pairwise_orthologs_folder`	Base directory where the `.orth.tsv.gz` files have been created. Usually corresponds to `$DARWIN_OMA_SCRATCH_PATH/Phase4/`. If not specified, no VPairs will be imported and dotplot won't work
`known_domains`	folder with processed known domain annotations. this will likely change in the future. For now, these files have to be generated outside the pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
assets		assets
config		config
containers		containers
modules/local		modules/local
subworkflows/local		subworkflows/local
testdata		testdata
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
test-params.json		test-params.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Pipeline summary

From production OMA pipeline:

From FastOMA:

Common part

Parameters

Corresponding files from OMA Production Run

About

Releases 1

Packages

Languages

License

DessimozLab/nf-oma-browser-build

Folders and files

Latest commit

History

Repository files navigation

Introduction

Pipeline summary

From production OMA pipeline:

From FastOMA:

Common part

Parameters

Corresponding files from OMA Production Run

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages