This directory contains several models and tools that used to run Fluid benchmarks for local and distributed training.
To start, run the following command to get the full help message:
python fluid_benchmark.py --help
Currently supported --model
argument include:
-
mnist
-
resnet
- you can chose to use different dataset using
--data_set cifar10
or--data_set flowers
.
- you can chose to use different dataset using
-
vgg
-
stacked_dynamic_lstm
-
machine_translation
-
Run the following command to start a benchmark job locally:
python fluid_benchmark.py --model mnist --parallel 1 --device GPU --with_test
You can choose to use GPU/CPU training. With GPU training, you can specify
--parallel 1
to run multi GPU training. -
Run distributed training with parameter servers:
- start parameter servers:
PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=1 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist --parallel 0 --device GPU --update_method pserver
- start trainers:
PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=1 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist --parallel 0 --device GPU --update_method pserver
- start parameter servers:
-
Run distributed training using NCCL2
PADDLE_PSERVER_PORT=7164 PADDLE_TRAINER_IPS=192.168.0.2,192.168.0.3 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist --parallel 0 --device GPU --update_method nccl2
We provide a script kube_gen_job.py
to generate Kubernetes yaml files to submit
distributed benchmark jobs to your cluster. To generate a job yaml, just run:
python kube_gen_job.py --jobname myjob --pscpu 4 --cpu 8 --gpu 8 --psmemory 20 --memory 40 --pservers 4 --trainers 4 --entry "python fluid_benchmark.py --model mnist --parallel 1 --device GPU --update_method pserver --with_test" --disttype pserver
Then the yaml files are generated under directory myjob
, you can run:
kubectl create -f myjob/
The job shall start.