Skip to content

twaddlac/bota_docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BOTA

BOTA: Predicting Bacteria origined T-cell antigens

One line pitcher

BOTA inspects all possible peptides in a genomic or metagenomic sequence and predicts its possibility of being presented on host T-cells.

Install

Docker

To build image, clone repo and run:

docker build -t bota .

To run docker container:

docker run -v `pwd`:/usr/local/src --rm -it bota /bin/bash

use git:

git clone https://bitbucket.org/luo-chengwei/BOTA

use hg:

hg clone https://bitbucket.org/luo-chengwei/BOTA

You can also download the zip archive of it from bitbucket repository:

https://bitbucket.org/luo-chengwei/BOTA

After this step, you will need to download the latest Pfam-A database, just follow below url:

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release

and download "Pfam-A.hmm.gz".

You will then gunzip it:

gunzip Pfam-A.hmm.gz

Then one last thing, you will need to prepare it by running hmmpress:

hmmpress Pfam-A.hmm

With the right dependencies installed and you are ready to go!

Dependencies

BOTA: Bacteria Originated T-Cell Antigen Predictor, is distributed, in part, under and subject to the provisions of licenses for its dependencies:

  • Python-2.7 or above
  • Python libraries:

BioPython

NetworkX

  • Third party pipelines:

standalone psortb v3.0+ (http://www.psort.org/)

standalone HMMTOP (http://www.enzim.hu/hmmtop)

standalone HMMER v3.1b2 (http://www.hmmer.org)

You don't have to install the package, just call BOTA.py from wherever you put the whole folder.

All rights reserved.

Input

The required input files are the .fa file of the genome and the .gff file (GFF3 format) of the annotations.

You can run multiple genomes in one project, you will just need to specify each of them in the config file.

Usage

The basic BOTA analysis runs as below:

python BOTA.py [options] -i <input_fasta_sequences> -o <output_directory>

Below is a detailed usage of BOTA.py:

Usage: BOTA.py -c/--config <config_file> -o/--outdir <output directory> [options]

BOTA: Bacteria-Origin T-cell Antigen predictor

The configuration file format follows:

# this a comment line
hmmscan='hmmscan' # if alread in the PATH, then just leave it as 'hmmscan', if not, specify the path
hmmtop='hmmtop' # the same as hmmscan
psort='psort'	# the same as hmmscan
[genome_name] # you need the squared bracket to contain whatever you want to call the genome
fna	/path/to/genomics/fna/file  # this is a compulsory field
gff	/path/to/gff/file  # optional. if not supplied, we will do prodigal protein-coding gene predictions
hmmtop /path/to/hmmtop/file # optional. if not supplied, we will do hmmtop calculation.
hmmscan /path/to/text-hmmscan/file # optional
psort /path/to/psort/file # optional
alleles list_of_alleles_separated_by_comma # Optional. you can also supply human or mouse to select all available alleles
          # if you don't specify, default to all alleles.
gram # Optional. specify the organism is 'P', gram-positive, 'N', gram-negative, or 'A', achaea; if not specified, BOTA
		will try to determine it.

Add --help to see a full list of required and optional arguments to run BOTA

Additional information can also be found at:

https://bitbucket.org/luo-chengwei/bota/wiki

Options:

--version show program's version number and exit

-h, --help            show this help message and exit

Required options:

These options are required to run BOTA, and may be supplied in any order.

-c FILE, --config=FILE   The configuration file to define a project.
-o DIR, --outdir=DIR  The output directory of BOTA. If it doesn't exist, BOTA will create it.

Optional parameters:

These options are optional, and may be supplied in any order.

-t INT, --nproc=INT    Number of processor for BOTA to use [default: 1; set 0 if you want to use all CPUs available].

Below is a list of HLA alleles that BOTA will support:

HLA-DRB10101
HLA-DRB10301
HLA-DRB10401
HLA-DRB10404
HLA-DRB10405
HLA-DRB10701
HLA-DRB10802
HLA-DRB10901
HLA-DRB11101
HLA-DRB11302
HLA-DRB11501
HLA-DRB30101
HLA-DRB40101
HLA-DRB50101
HLA-DPA10103-DPB10401
HLA-DPA10103-DPB10201
HLA-DPA10201-DPB10101
HLA-DPA10201-DPB10501
HLA-DPA10103
HLA-DPB10301_DPB10401
HLA-DPA10301-DPB10402
HLA-DQA10101-DQB10501
HLA-DQA10102-DQB10602
HLA-DQA10301-DQB10302
HLA-DQA10401-DQB10402
HLA-DQA10501-DQB10201
HLA-DQA10501-DQB10301
H-2-IAb
H-2-IAd

For instance, if you have a genome that is the EGD-e strain of Listeria monocytogenes in fastA format: "L.monocytogenes_EGD-e.fa", and you want to find out the epitodes that could be presented by mouse "H-2-IAb", you could run:

BOTA.py -c Listeria_monocytogenes_EGD-e/config -o Listeria_monocytogenes_EGD-e/BOTA/ -t 2

The output of the eiptopes predicted would be in the file: test/Listeria.BOTA/L.monocytogenes_EGD-e.epitopes.out

Interpret output

The output from the previous example looks like this:

#peptide	gene_name	chr_acc	gene_start	gene_stop	strand	pep_start	pep_stop	score
FSSATLNSA	TA05_25490	JXUN01000256.1	6793	9210	-	1111	1140	0.695987
ELGALSLSA	TA05_25490	JXUN01000256.1	6793	9210	-	1195	1224	0.758599
WPAGGLASA	TA05_19735	JXUN01000201.1	15072	17129	+	1399	1428	0.697193
ISLALAAPSYAAEA	TA05_13960	JXUN01000151.1	17693	19804	+	43	87	0.758729
FSVAAAMES	TA05_09645	JXUN01000116.1	12753	14144	-	889	918	0.826708
FSVAYASQA	TA05_03005	JXUN01000059.1	24619	25944	-	412	441	0.957796
AQGVVTAPAQNSTVAVA	nlpD	JXUN01000196.1	6459	7586	-	544	597	0.918355
VTAPVTAPAVSTT	nlpD	JXUN01000196.1	6459	7586	-	682	723	0.961591
ASKPTITYS	nlpD	JXUN01000196.1	6459	7586	-	592	621	0.646375
TTEPTASST	nlpD	JXUN01000196.1	6459	7586	-	715	744	0.860312
......

Tutorial

Below is an example to walk you through what a real BOTA analysis should look like.

[TBD]

Citation

Graham DB*, Luo C*, Abelin JG, Matar CG, Conway KL, Lefkovith A, Jasso GJ, Causer K, Mani DR, Carr SA, and Xavier RJ, Antigen discovery by MHCII peptidomics reveals biochemical features of immunodominance. [In Review]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published