Skip to content

MicrobiologyETHZ/PathoScope_Py3

 
 

Repository files navigation

PathoScope 2.0

Pathoscope: Species identification and strain attribution with unassembled sequencing data

install with bioconda

Quick Start

If you would like to get started using PathoScope, chek out the following tutorial:

Introduction

Pathoscope 2.0 consists of four core and two optional analysis modules for sequencing-based metagenomic profiling. The PathoLib module extracts genome reference libraries (target or host/filter) from all available sequences in the NCBI Nucleotide database that belong to a user-defined taxonomic clade. The PathoMap module aligns the reads to the target reference library and removes any reads that have sequence similarity with the host or filter genomes. PathoID resolves read ambiguity, identifies which of the target genomes are present in the sample and estimates the proportions of reads originating from each genome. PathoReport provides two report files: 1) a summary report (.tsv) that contains the numbers and proportions of reads aligned to each genome identified in the sample, and 2) detailed report (.xml) including read coverage, read assignments, and contiguous sequences generated by combining the reads. The PathoDB is an optional module that provides additional annotation (organism taxonomic lineage, gene loci, protein products) for all sequences identified in the sample. The PathoQC module can be used to preprocess the reads prior to alignment with PathoMap.

Support and Contact

Pathoscope is developed at the Johnson Lab at Boston University and the Crandall Lab at George Washington University.
For any issues or concerns, please contact us at pathoscope@googlegroups.com

W. Evan Johnson, Ph.D.
Division of Computational Biomedicine
Boston University School of Medicine
72 E. Concord St., E-645
Boston, MA 02118

Keith A. Crandall, Ph.D.
Computational Biology Institute, Milken Institute School of Public Health
The George Washington University
800 22nd Street, NW, Science and Engineering Hall, Suite 7000
Washington, DC 20052

Developers:

Solaiappan Manimaran (Johnson)
Changjin Hong (Johnson)
Eduardo Castro-Nallar (Crandall)
Matthew Bendall (Crandall)

References

  1. Owen E. Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L. Clement, Eduardo Castro-Nallar, Quinn Snell, G. Bruce Schaalje, Mark J. Clement, Keith A. Crandall and W. Evan Johnson "Pathoscope: Species identification and strain attribution with unassembled sequencing data." Genome research 23.10 (2013): 1721-1729. PMID: 23843222
  2. Changjin Hong, Solaiappan Manimaran, Ying Shen, Joseph F Perez-Rogers, Allyson L Byrd, Eduardo Castro-Nallar, Keith A Crandall and William Evan Johnson "PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples." Microbiome 2.1 (2014): 1-15.PMID: 25225611
  3. Allyson L Byrd, Joseph F Perez-Rogers, Solaiappan Manimaran, Eduardo Castro-Nallar, Ian Toma, Tim McCaffrey, Marc Siegel, Gary Benson, Keith A Crandall and William Evan Johnson "Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data." BMC bioinformatics 15.1 (2014): 262. PMID: 25091138
  4. Changjin Hong, Solaiappan Manimaran and William Evan Johnson "PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets", Cancer Informatics 2014:Suppl. 1 167-176. PMID: 25983538

Table of Contents

About

Pathoscope compatible with Python 3

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%