Menace

This bundle of software is a basic implementation of the algorithm for extracting Peak-to-Trough Ratios from Metagenomic data, as first described in (Korem et. al, Science, 2015).

Installation:

Pip

Make sure that "pip" is the PyPi command of your python2 installation, then:

pip install menace

Git

git clone git@github.com:zertan/Menace.git
cd Menace
python setup.py install

This should install the below python dependencies. The other dependencies have to be installed manually (if you have questions about this I suggest you consult your cluster IT help desk).

The software has been tested on the "hebbe" cluster at C3SE which uses the "slurm" system for resource management (thus slurm is the only queueing system currently supported).

Dependencies:

Python2:
numpy
scipy
pandas
biopython
matplotlib
xmltodict
configparser
lmfit
newick
Jinja2
doric
-e git+https://github.com/PathoScope/PathoScope.git#egg=pathoscope

samtools

bamtools

bowtie2

Pathoscope 2.0 (should be installed by the above pip command but make sure 'pathoscope ID' is accessible in the shell, ie. is on the system path)

parallel

DoriC is a databse of chromosome origin locations (OriCs) which is a (recommended) optional dependency for the pipeline. Please visit the link and enter your e-mail to download.

Usage

You can get an overview of the menace functionality by running menace -h.

Initialize a project in current directory by running menace init. Identify a set of NCBI genome reference accession numbers and put them in "./searchStrings" (or use the default one which includes a minimal set of references to bacteria common in the human gut).
Identify a metagenomic cohort of interest (download manually or add URLs as described below) and add to the Data folder. Supported input: raw/gzipped/bzipped ".fastq" files.
Add information to the project.conf file.
Edit loadmodules.sh to include the python2 module of the cluster (or comment out the lines if python2 is accessible by default).
Run menace full (use "nohup {cmd} &" to keep alive after logout if on a cluster login node).
Wait for job to complete. Run menace collect in project directory.

Notes

The menace script is a common utility for all parts of the pipeline including downloading of references and metagenomic data, bulding a reference index, setting up the necessary file structure and submitting to slurm. Hence, all configuration is intended to be set up in project.conf (please see bin/project.conf.example for an example).

The default 'searchStrings' will most probably not fit your purposes but is only an example. A more comprehensive Reference library will yield higher coverage and more accurate values. A more comprehensive list of human gut bacteria is available at 'extra/referenceACClong.txt'.

Directory structure (example)

With the above usage example the path structure(s) will look something like below.

$DATA_PATH
  ├ "Sample01"								       (eg. ERR525688)
  .  ├ {sample01_1.fastq.gz}
  .  └ {sample01_2.fastq.gz} 				 paired metagenomic reads
  .

$REF_PATH
  ├ Index
  |  └ {REF_NAME.*.bt2l}					    bowtie2 index files
  ├ Fasta
  |  └ {accession.fasta}
  ├ Headers
  |  └ {accession.xml}						    xml files containing extra genome references info
  └  taxIDs.txt

$DORIC_PATH
  ├ bacteria_record.dat
  └ bacteria_seq.fas

$OUTPUT_PATH
  ├ "Sample01"
  .  ├ depth
  .  |  └ {accession.depth} 				  coverage files for each reference
  .  ├ log
     |  └ {accession.log}					    output logs from piecewiseFit	
     ├ npy
     |  └ {accession_OriC_TerC.npy}		numpy files with origin/terminus locations and relative C periods
     ├ png
     |  └ {accession_fit.png}  				images of piecewise fit of the smoothed coverage
     └ accession-sam-report.tsv				Pathoscope2 reassignment report

Implements the piecewise linear fit and prior checks on the generated depth files to filter out those instances in which enough data was generated to produce a reliable coverage signal for estimating replication origins. This data can be used further on, once those has been estimated using the full cohort, to produce PTR-vaules for each sample.

input: {reference.depth}

output: {reference_OriC.npy}, {reference_TerC.npy}, {reference_coverage.png}, {reference_fit.log}

fetchSeq.py

This utility can be used to download '.fasta' reference files from the NCBI servers.

input: searchStrings.txt,

output: {reference.fasta}, {reference.xml}, taxIDs.txt

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
menace		menace
.gitignore		.gitignore
AUTHORS		AUTHORS
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Menace

Installation:

Pip

Git

Dependencies:

Usage

Notes

Directory structure (example)

Contents

jobscript

mainBuild.sh

PTRMatrix.py

piecewiseFit.py

fetchSeq.py

About

Releases

Packages

Languages

License

zertan/Menace

Folders and files

Latest commit

History

Repository files navigation

Menace

Installation:

Pip

Git

Dependencies:

Usage

Notes

Directory structure (example)

Contents

jobscript

mainBuild.sh

PTRMatrix.py

piecewiseFit.py

fetchSeq.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages