This repository is a distributed data parallel version of dcgan.pytoch examples
The example implements the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
The implementation is very close to the Torch implementation dcgan.torch
After every 100 training iterations, the files real_samples*.png
and fake_samples*.png
are written to disk
with the samples from the generative model.
First download and extract the desired dataset. In this example we'll download and use CIFAR-10 dataset, which can be obtained using the commands below:
mkdir cifar10 && cd cifar10
wget --no-check-certificate https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -xvf cifar-10-python.tar.gz
cd ..
This will download and extract cifar-10 dataset to ./cifar10/
- Build the docker image from this repository using the command below:
docker build -t dist_dcgan .
- Let's start 2 terminals to test the network using data distributed parallelism. On first terminal run:
docker run --env OMP_NUM_THREADS=1 --rm --network=host -v=$(pwd):/root dist_dcgan:latest python -m torch.distributed.launch --nproc_per_node=1 --nnodes=2 --node_rank=0 --master_addr="172.17.0.1" --master_port=1234 dist_dcgan.py --dataset cifar10 --dataroot ./cifar10
And in the second terminal, run:
docker run --env OMP_NUM_THREADS=1 --rm --network=host -v=$(pwd):/root dist_dcgan:latest python -m torch.distributed.launch --nproc_per_node=1 --nnodes=2 --node_rank=1 --master_addr="172.17.0.1" --master_port=1234 dist_dcgan.py --dataset cifar10 --dataroot ./cifar10
- The
-v
parameter is mapping the current directory to docker's/root
directory, where data and the scripts resides. - The
--nproc_per_node
is the number of processes to start in each node. - The
--nnodes
parameter defines the number total of nodes (world_size). - The
--node_rank
parameter defines the rank for this particular node. Each node must have an unique node_rank. - The
--master_addr
and-master_port
parameters is the address and port to the master node (rank 0), respectively. Check if docker's IP is visible otherwise the processing will be stuck and will never starts. - The
--dataset
and--dataroot
parameters indicates the dataset and where it is located, respectively. - Use
--gpus=all
to docker command and--cuda
todist_dcgan.py
to enable the use of GPUs. For multiple gpus in a single node, it is recommended to set--nproc_per_node=1
(one process per gpu) and each for each process, use the--local_rank
parameter to assign it a a specific GPU (or using CUDA_VISIBLE_DEVICES environment variable).