Skip to content

Run multiple DA tools on datasets using Docker Image

Dhwani Desai edited this page Jul 5, 2021 · 17 revisions

Welcome to the Comparison_of_DA_microbiome_methods wiki!

This page describes the use of a Docker image containing multiple microbiome Differential Analysis tools. If you have a working version of Docker installed on your system, you can jump to "Running multiple DA tools with Docker Image"

Installing Docker

Docker can be installed on Linux, Mac as well as Windows. Here are a few links to install Docker on these operating systems

For Installation on Ubuntu Linux

https://docs.docker.com/engine/install/ubuntu/

For Installation on Windows

https://docs.docker.com/docker-for-windows/install/

For Installation on MacOS

https://docs.docker.com/docker-for-mac/install/

Tips for setting up Docker on Ubuntu

=== Tip to change the docker installation directory (where it stores the generated image) from the default / ===

On Ubuntu, by default, Docker is installed in the "/" folder. Building or pulling and using Docker Images occupies a sizable amount of harddrive space and by default, all these files end up in the "/" area. This runs the risk of clogging up the "/" area. The following is a protocol to change the default storage in Docker from "/" to another folder on a partition that has sufficient space. We will use systemd from Ubuntu to control Docker behaviour

Create a docker.service.d folder in /etc/systemd

sudo mkdir -p /etc/systemd/system/docker.service.d

Create a config file for controlling Docker; Open a new file in an editor such as nano

sudo nano /etc/systemd/system/docker.service.d/docker-storage.conf

paste the following lines in the docker-storage.conf file using an editor (E.g nano)

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// --data-root="/path/to/new/docker_storage folder"

REstart the docker service

sudo systemctl daemon-reload

sudo systemctl restart docker

Check the change in directory by doing the following: It should now point to "/path/to/new/docker_storage folder"

docker info|grep -P "Docker Root Dir"

=== Tip to run docker without sudo everytime ===

Add a user group docker

sudo groupadd docker

Add the current user to this group

sudo gpasswd -a $USER docker

Change to the new group; You can also logout and login again

newgrp docker

Running Docker Image

Pull the docker image from DockerHub

docker pull dockerdkd/hackathon2021

This image can be used in three different ways:

  1. Running all the DA tools on any given individual dataset
  2. Running all the DA tools on all the datasets available at https://figshare.com/articles/dataset/16S_rRNA_Microbiome_Datasets/14531724
  3. Running all the analyses scripts mentioned in Nearing et al., 2021 https://www.biorxiv.org/content/10.1101/2021.05.10.443486v1

Running all the DA tools on any given individual dataset

This command assumes that for any given dataset a minimum of three files are present in the Input Directory, with the following format of names

  • $DATASETTAG_genus_table.tsv - the ASV table file at the genus level
  • $DATASETTAG_meta.tsv - the metadata table
  • $DATASETTAG_genus_table_rare.tsv - the rarefied ASV table at the genus level
mkdir DAtools_output

docker run --user root -it -e "DATASETTAG=ArcticFreshwaters" -e "DEPTH=2000" -e "FILTER=0.1" -v $PWD/Hackathon/Studies/ArcticFreshwaters/:/home/hackathonuser/Input_data -v $PWD/DATools_output/:/home/hackathonuser/output dockerdkd/hackathon2021:latest

In the above example, we used the Docker container to run all the DA tools on the ArcticFreshwaters dataset using a rarefaction depth of 2000 and a occurrence filter of 0,1 percent for removing taxa contributing <= 0.1 percent of total

By default, the output files generated will be under the ownership of "root". so we need to change the ownership of the output folder

chown -R <$USER> DAtools

chgrp -R <$USER> DAtools