This tool expands the work of DBGWAS (Jaillard et al, 2018) and brings it to the metagenomic scale. It finds variants significantly associated with a given phenotype of interest, and output its findings in a web page. That web page shows the selected unitigs with their surounding graph component, and a user-provided annotation can be used.
You can find the internship report for this project in the report/
folder.
Here is an example of the output of Metadbgwas: significant components are shown in preview with annotation if provided, and a click on them will take you to the page with full information and interactive graph.
- GCC >= 9.4
- CMAKE > 3.10.0
- zlib
- build-essentials
- pthreads
- blast The blast suite in your path
- R >= 3.2.0
- Boost
- Use the following command to download the repository :
git clone --recursive https://github.com/Louis-MG/Metadbgwas.git
- Complete the installation :
cd Metadbgwas sh install.sh
* General
NOTE: path should be absolute.
--files <path> path to the directory containing the read files.
--output <path> path to the output folder. Default set to ./ .
--threads <int> number of threads to use. !! Default set to 4 !!
--verbose <int> level of verbosity. Default to 1, 0-1. 0 is equivalent to --quiet.
--clean removes intermediary files to save space if you are worried about your storage.
--skip1 skips the Lighter correction step. Corrected files are supposed to be in the output folder.
--skip2 skips the Lighter and Bcalm2 steps. Corrected files and unitigs folder are supposed to be in the output folder.
--skip3 skips the Lighter, Bcalm2 and REINDEER steps. Corrected files, unitigs and matrix folder are supposed to be in the output folder.
* Lighter
NOTE: if your datset contains different bacterial genomes with very different size, it is better to choose --k option and provide the pick-rate (noted alpha).
--K <int> <int> kmer length and approximate genome size (in base). Recommended is 17 G.
or
--k <int> <int> <float> kmer length and genome size (in base), alpha (probability of sampling a kmer). Recommended is 17 G alpha. alpha is best chosen at 7/coverage.
* Bcalm2
--kmer <int> kmer length used for unitigs build. Default to 31.
--abundance-min <int> minimum number of occurence of a kmer to keep it in the union DBG. Default to 5, highly recommended to change to the 2.5% quantile of the Poisson law with lambda = coverage.
* Bifrost
Bifrost uses kmer, threads, and output parameters. No others need to be specified.
* DBGWAS
--strains A text file describing the strains containing 3 columns: 1) ID of the strain; 2) Phenotype (a real number or NA); 3) Path to a multi-fasta file containing the sequences of the strain. This fil>
--newick Optional path to a newick tree file. If (and only if) a newick tree file is provided, the lineage effect analysis is computed and PCs figures are generated.
--nc-db A list of Fasta files separated by comma containing annotations in a nucleotide alphabet format (e.g.: -nc-db path/to/file_1.fa,path/to/file_2.fa,etc). You can customize these files to work better with DBGWAS (see https://gitlab.com/leoisl/dbgwas/tree/master#customizing-annotation-databases).
--pt-db A list of Fasta files separated by comma containing annotations in a protein alphabet format (e.g.: -pt-db path/to/file_1.fa,path/to/file_2.fa,etc). You can customize these files to work better with DBGWAS (see https://gitlab.com/leoisl/dbgwas/tree/master#customizing-annotation-databases).
--threshold maximum value for which phenotype will be considered to be 0.
* Miscellaneous
--license prints the license text in standard output.
--help displays help.
bash metadbgwas.sh --files ./input --output ./output --K 17 6000000 --strains ./strains --threads 4
An image is hosted on Docker hub. You can also build it localy using the dockerfile located in /docker
. You might have to add sudo
if you didnt run the post-installation steps of docker.
docker pull 007ptar007/metadbgwas:latest
docker run -v 'path/to/input/folder:/input' metadbgwas --files ./input --strains ./input/strains --threads 40 --output ./output --K 17 G
You can also run the docker image with singularity:
singularity pull docker://007ptar007/metadbgwas
singularity run -H /path/to/input metadbgwas_latest.sif --files ./input --strains ./input/strains --threads 40 --output ./output --K 17 G
User can find in the output folder :
- the corrected fasta files.
- unitigs folder with bcalm2 output, sample-wise and dataset-wise.
- step1, step2 and step3 that contains internal files of the modified DBGWAS
- visualisation contains visulatisation files.
- command_line.txt with the paremeters used for the execution
Please cite this tool as :
Metadbgwas, Louis-Mael Gueguen, 2022.
You can post issues in the issue section of the github repository. You can also email me at lm<dot>gueguen<at>orange<dot>fr
. I will do my best to resolve them.
The work is available under the zlib license.