Benchmark/Integrate benchmark scripts #10707

typhoonzero · 2018-05-16T11:26:58Z

Integrate all benchmark python programs to one, we can then run a command like:

python fluid_benchmark.py --model vgg --parallel 1 --device GPU  --data_set flowers --memory_optimize --with_test

to start either local CPU/GPU benchmarking or distributed multi-GPU benchmarking.

In distributed mode, corresponding environment variables must be set to let workers know which role is that node.

… integrate_benchmark_scripts

Superjomn · 2018-05-19T11:08:54Z

benchmark/fluid/fluid_benchmark.py

+        default=0.001,
+        help='The minibatch size.')
+    # TODO(wuyi): add this option back.
+    # parser.add_argument(


remove these

Superjomn · 2018-05-19T11:09:55Z

benchmark/fluid/fluid_benchmark.py

+def append_nccl2_prepare():
+    if os.getenv("PADDLE_TRAINER_ID", None) != None:
+        # append gen_nccl_id at the end of startup program
+        trainer_id = int(os.getenv("PADDLE_TRAINER_ID"))


what if the env not exists

If the user added --update_method nccl2 and didn't provide PADDLE_TRAINER_ID, this script will raise an error, else if the user didn't provide --update_method it will run default as local training.

Yancey1989 · 2018-05-21T03:12:41Z

benchmark/fluid/README.md

@@ -0,0 +1,60 @@
+# Fluid Benchmark
+
+This directory contains several models and tools that used to run


several models => several model configurations.

Yancey1989 · 2018-05-21T03:14:50Z

benchmark/fluid/models/stacked_dynamic_lstm.py

-        exit(0)
+    return loss, inference_program, adam, train_reader, test_reader, batch_acc
+
+    # iters, num_samples, start_time = 0, 0, time.time()


Please delete these unused code.

dzhwinter · 2018-05-22T10:53:39Z

benchmark/fluid/fluid_benchmark.py

+        '--with_test',
+        action='store_true',
+        help='If set, test the testset during training.')
+    parser.add_argument(


Maybe this should be added by default?

dzhwinter · 2018-05-22T10:54:47Z

benchmark/fluid/fluid_benchmark.py

+    if args.parallel == 0:
+        # NOTE: parallel executor use profiler interanlly
+        if args.use_nvprof and args.device == 'GPU':
+            with profiler.cuda_profiler("cuda_profiler.txt", 'csv') as nvprof:


actually, the cuda_profiler is not used anymore, I will fire a PR and delete them.

dzhwinter · 2018-05-22T11:02:42Z

benchmark/fluid/fluid_benchmark.py

+            raise Exception(
+                "Must configure correct environments to run dist train.")
+        train_args.extend([train_prog, startup_prog])
+        if args.parallel == 1 and os.getenv(


parallel == 1 will lead to the confused meaning of thread count, how about change to another name?

dzhwinter · 2018-05-22T11:17:47Z

benchmark/fluid/kube_gen_job.py

+                    # to let container set rlimit
+                    "securityContext": {
+                        "privileged": True
+                        # "capabilities": {


commented out code

dzhwinter · 2018-05-22T11:45:01Z

benchmark/fluid/kube_gen_job.py

+import random
+import os
+
+pserver = {


The distributed jobs have a lot of same configurations, but now these templates and environment variables messed up inside the job generative script. We can put a template yaml configuration file, and set the variable default value with arguments in submit scripts.

dzhwinter · 2018-05-22T11:47:04Z

benchmark/fluid/kube_gen_job.py

+import os
+
+pserver = {
+    "apiVersion": "extensions/v1beta1",


And, In my view, the yaml format is more concise than json format? which one do you prefer?

Comments all done, I kept using json so that we can directly import it as in-memory data.

… integrate_benchmark_scripts

dzhwinter

Great Job. We can launch the dist benchmark scripts on CE now.

dzhwinter · 2018-05-23T08:23:42Z

benchmark/fluid/fluid_benchmark.py

+                                 batch_acc)
+            print(", Test Accuracy: %f" % pass_test_acc)
+        print("\n")
+        # TODO(wuyi): add warmup passes to get better perf data.


the skip_batch_num arguments do the trick. In my experiment, in the local machine 5-10 batches would be fine.

typhoonzero added 8 commits May 16, 2018 19:14

wip integrate benchmark scripts

721d115

testing nlp models

bf0258b

k8s script to start dist benchmark job

5c7aad7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c00e6c7

… integrate_benchmark_scripts

update script

5095d21

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

707011e

… integrate_benchmark_scripts

done support all models

c0d9bfb

add README.md

a3d5cb4

typhoonzero changed the title ~~[WIP] Benchmark/Integrate benchmark scripts~~ Benchmark/Integrate benchmark scripts May 17, 2018

typhoonzero requested review from Yancey1989 and dzhwinter May 17, 2018 12:27

Superjomn reviewed May 19, 2018

View reviewed changes

Yancey1989 reviewed May 21, 2018

View reviewed changes

typhoonzero added 2 commits May 21, 2018 12:56

update by comment

443da9a

clean up

8b3b03b

dzhwinter reviewed May 22, 2018

View reviewed changes

yi.wu added 2 commits May 23, 2018 09:41

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5239100

… integrate_benchmark_scripts

follow comments

82de91b

dzhwinter approved these changes May 23, 2018

View reviewed changes

typhoonzero merged commit 55d3951 into PaddlePaddle:develop May 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark/Integrate benchmark scripts #10707

Benchmark/Integrate benchmark scripts #10707

typhoonzero commented May 16, 2018 •

edited

Loading

Superjomn May 19, 2018

Superjomn May 19, 2018

typhoonzero May 21, 2018

Yancey1989 May 21, 2018

Yancey1989 May 21, 2018

typhoonzero May 22, 2018

dzhwinter May 22, 2018

dzhwinter May 22, 2018

dzhwinter May 22, 2018

dzhwinter May 22, 2018

dzhwinter May 22, 2018

dzhwinter May 22, 2018

typhoonzero May 23, 2018

dzhwinter left a comment

dzhwinter May 23, 2018

		@@ -0,0 +1,60 @@
		# Fluid Benchmark

		This directory contains several models and tools that used to run

Benchmark/Integrate benchmark scripts #10707

Benchmark/Integrate benchmark scripts #10707

Conversation

typhoonzero commented May 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhwinter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented May 16, 2018 •

edited

Loading