snakemake/python based metagenomics pipeline better then the first
It requires snakemake
and conda
to run (well conda not necessarily but if you don't have it, it's gonna be a pain).
You will need to fill out a config file similar to the JSON
-file included in the sample_configs
-folder that gives paths to a set of csv
-files (also some samples in the sample_configs
-folder). Each csv
-file describes the libraries, assemblies, binnings, and bin-sets respectively.
Things to avoid in your config-files: the -
symbol, duplicate names.
The pipeline is then simply run as:
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores NB_THREADS
This will run all the bin-sets.
If you want to run all libraries, all assemblies or all binnings respectively run:
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores NB_THREADS --until all_libs
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores NB_THREADS --until all_asses
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores NB_THREADS --until all_binnings
All options of snakemake
are obviously available, for example:
#shows the jobs to compute
snakemake --use-conda --configfile YOUR_CONFIG_FILE -n --quiet
#makes a fancy diagram somehow, google it
snakemake --use-conda --configfile YOUR_CONFIG_FILE -dag
#run it on your SLURM cluster
snakemake --use-conda --configfile YOUR_CONFIG_FILE -j MAX_NB_OF_SUBMITED_JOBS --use-conda --local-cores NB_OF_THREADS --cluster "sbatch -D `pwd` -A YOUR_ACCOUNT -t '7-00:00:00' -n 20"
specific files can also be generated by directly "making" them, for example:
#make the MYLIN library
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/libraries/MYlIB/MYLIB_fwd.fastq.gz
#make the MYBINNING binning
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/binnings/MYBINNING/binned_assembly.fna
#make the MYASS assembly
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/assemblies/MYASS/assembly.fna
#make the MYBINSET binset
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/binsets/MYBINSET/MYBINSET.fna
A few additional scripts are available for various purposes, they might need some python libs, do check your error messages, Also they probably only runs if you are in the folder where the workflow is.:
run :
python $metasssnake2_path/workflow/scripts/utils.py validate_descriptor YOUR_CONFIG_FILE
to check your config-file
if your libraries are in the right folder structure, somehow, generates a bunch of csvs as a pair of reads per library and single sample assemblies and binning, and one big binset with all. As well as a config-file.
python $metasssnake2_path/workflow/scripts/utils.py csv_generator ABSOLUTE_PATH_TO_THE_FOLDER ABSOLUTE_OUT_FILE_PREFIX
some editing of the outputted JSON is necessary though