This blog post talks the difference between binning and profiling in case you are interested to learn about it, so you can learn if FOCUS is right for you - FOCUS is a profiling tool.
- Python 3.6
- Setuptools 36.0.1
- Jellyfish >= 2.2.6. if using macOS, use bioconda
- Numpy 1.12.1
- SciPy 0.19.0
- unzip/curl
# pip3 also install numpy and scipy
pip3 install metagenomics-focus
You can now easily install FOCUS using conda via the Bioconda channel. It is as easy as:
# bioconda should handle all the dependencies
conda create -n focus -c bioconda focus
source activate focus
This will create a conda environment called focus
(as specified by the
-n
argument), and install FOCUS along with all its dependencies. The second
line activates the newly created focus
conda environment.
# these steps should install Numpy and Scipy
# clone focus
git clone git@github.com:metageni/FOCUS.git
# unzip database and move it to folder
unzip FOCUS/focus_app/db.zip && mv db FOCUS/focus_app/
# install focus
cd FOCUS && python setup.py install
focus [-h] [-v] -q QUERY -o OUTPUT_DIRECTORY [-k KMER_SIZE]
[-b ALTERNATE_DIRECTORY] [-p OUTPUT_PREFIX] [-t THREADS]
[--list_output] [-l LOG]
FOCUS: An Agile Profiler for Metagenomic Data
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-q QUERY, --query QUERY
Path to directory with FAST(A/Q) files
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Path to output files
-k KMER_SIZE, --kmer_size KMER_SIZE
K-mer size (6 or 7) (Default: 6)
-b ALTERNATE_DIRECTORY, --alternate_directory ALTERNATE_DIRECTORY
Alternate directory for your databases
-p OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
Output prefix (Default: output)
-t THREADS, --threads THREADS
Number Threads used in the k-mer counting (Default: 4)
--list_output Output results as a list
-l LOG, --log LOG Path to log file (Default: STDOUT).
example > focus -q samples_directory
The query can be one or more fasta or fastq files, or a directory containing those files. We filter for
files that end .fasta
, .fastq
, or .fna
, so please ensure any file that you want processed has one
of those file extensions.
You can provide a mixture of input files or directories, and we will filter the files as appropriate.
For example:
focus -q fastq1.fastq -q fastq2.fastq -q directory/ -o output
will process the two fastq files fastq1.fastq
and fastq2.fastq
as well as any fasta
or fastq
files in directory
and put the output in output
.
FOCUS generates a tabular output per taxonomic level (Kingdom
, Phylum
, Class
, Order
, Family
, Genus
, Species
, and Strain
) and one with all levels which can be used as STAMP's input for statistical analysis.
New genomes can be added into the database by using command focus_database_utils
.
It only requires a (-g
) a tabular file (\t
) as input with a genome per row where the columns are composed by the metadata
of the genome on Kingdom
, Phylum
, Class
, Order
, Family
, Genus
, Species
, Strain
, and path to FASTA file or the genome file
.
usage: focus_database_utils [-h] [-v] -g GENOMES [-t THREADS] [-l LOG]
FOCUS Database Utils
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-g GENOMES, --genomes GENOMES
Path to directory with FAST(A/Q) files
-t THREADS, --threads THREADS
Number Threads used in the k-mer counting (Default: 4)
-l LOG, --log LOG Path to log file (Default: STDOUT).
example > focus_database_utils -m GENOMES_TABULAR_FILE
FOCUS was written by Genivaldo G. Z. Silva. Feel free to contact me
If you use FOCUS, please cite it:
Silva, G. G. Z., D. A. Cuevas, B. E. Dutilh, and R. A. Edwards, 2014:
FOCUS: An alignment-free model to identify organisms in metagenomes
using non-negative least squares. PeerJ.