[Speed] feature/ParallelExecutor #8891

tonyyang-svail · 2018-03-08T21:43:13Z

Profiling result:
script: as the example in this pr
command:

CUDA_VISIBLE_DEVICES=0             nvprof -f -o one.nvvp python parallel_executor_example.py --batch_size=32
CUDA_VISIBLE_DEVICES=0,1,2,3       nvprof -f -o four.nvvp python  parallel_executor_example.py --batch_size=32

Setting	copy weights	forward and backward	merge gradient	apply gradient
1 with nccl on bp	/	250	/	5
4 with nccl on bp	/	750(AllReduce takes about 63%)	/	5

Save Model (to be implemented)

In the current implementation, the ParallelExecutor's constructor creates a base scope and n (n equals the number of GPUs) sub scopes, the model is replicated in each sub scope. The save model function cannot access the sub scopes created by the ParallelExecutor.

Proposed Solution

ParallelExecutor's constructor creates n-1 sub scopes. ParallelExecutor.run will take a scope parameter, which will be attached as another sub scope. In this way, the user can create a scope, use it for ParallelExecutor.run as well as for save model.

…lel_executor

helinwang · 2018-03-12T21:39:01Z

python/paddle/fluid/tests/book/parallel_executor_example.py

+    conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
+    conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])
+
+    # drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5)


Could you remove the commented code?

helinwang · 2018-03-12T21:56:46Z

python/paddle/fluid/parallel_executor.py

+print_lock = Lock()
+
+
+def save_print(*args, **kwargs):


Is save_print and pretty_id_indent necessary? They are not used in this PR.

helinwang · 2018-03-12T22:11:30Z

doc/design/parallel_executor.md

+variables. For example
+
+1. NCCL communicator
+1. Data reader(?)


Do we need "?", if so probably need to explain it.

panyx0718 · 2018-03-13T01:21:32Z

doc/design/parallel_executor.md

+```python
+cost = your_neural_network()
+
+opt = fluid.optimizer.SGDOptimizer(..., append_all_reduce=True)


what if append_all_reduce=True is not set, will it use other way to send gradients to each other? Or just raise exception?

Currently, it's not handled.

panyx0718 · 2018-03-13T01:27:02Z

python/paddle/fluid/optimizer.py

-    def __init__(self, learning_rate, regularization=None):
+    def __init__(self,
+                 learning_rate,
+                 global_step=None,


you might want to put global step next to avoid breaking exiting users?

panyx0718 · 2018-03-13T01:27:19Z

python/paddle/fluid/optimizer.py

@@ -35,7 +36,11 @@ class Optimizer(object):
    but need to use one of it's implementation.
    """

-    def __init__(self, learning_rate, regularization=None):
+    def __init__(self,


comments for the arguments?

panyx0718 · 2018-03-13T01:27:41Z

python/paddle/fluid/optimizer.py

@@ -53,6 +58,7 @@ def __init__(self, learning_rate, regularization=None):
        # {accum_name : { paramter_name : accumulator_for_parameter, ...}, ...}
        self._accumulators = defaultdict(lambda: dict())
        self.helper = None
+        self.append_all_reduce = append_all_reduce


why is this a public attribute?

It should be private.

panyx0718 · 2018-03-13T01:31:23Z

python/paddle/fluid/parallel_executor.py

+        core.init_nccl_com(self.scope, gpu_list)
+
+    def run(self,
+            program=None,


Is it possible to run part of the program in multi-gpus and keep others in 1 gpu or cpu

I don't think so.

panyx0718 · 2018-03-13T01:32:07Z

python/paddle/fluid/parallel_executor.py

+
+    def run(self,
+            program=None,
+            feed=None,


In the design doc, you said feed won't be exposed?

panyx0718 · 2018-03-13T01:33:11Z

python/paddle/fluid/optimizer.py

+                 learning_rate,
+                 global_step=None,
+                 regularization=None,
+                 append_all_reduce=False):


I feel this to be an implementation detail. Can we avoid exposing it to user?

panyx0718 · 2018-03-13T01:34:10Z

doc/design/parallel_executor.md

+exe = fluid.ParallelExecutor(gpu_list=[0, 1])
+```
+
+## Design


Briefly describe the plan for model save/restore?

panyx0718 · 2018-03-13T01:35:51Z

python/paddle/fluid/parallel_executor.py

+        q = Queue(maxsize=len(self.executors))
+        for idx, exe in enumerate(self.executors):
+            cur_scope = self.scopes[idx]
+            t = Thread(


I'm a little worried about doing multi-threading in Python for such important computation, but maybe I'm wrong.

I think need an Executor written in c++ and wrap it in python, so the thread in CPU can use multiple cores.

@typhoonzero Right. A design of C++ ParallelExecutor will be submitted in a separate PR.

panyx0718 · 2018-03-13T01:37:42Z

python/paddle/fluid/tests/book/parallel_executor_example.py

+        gpu_list=range(fluid.core.get_cuda_device_count()))
+
+    # Parameter initialization
+    exe.run(fluid.default_startup_program())


So, parameter initialization and other startup computations are also done in parallel N times?

tonyyang-svail · 2018-03-13T23:57:11Z

A c++ implementation of ParallelExecutor will be submitted at #9035.

This PR will be closed.

Because it would be relatively easier to adapt multi-stream at the C++ level.

helinwang and others added 11 commits February 26, 2018 16:53

parallel executor: WIP

0770935

add global append allreduce

c5b8d29

merge develop

540410b

add design template

ae7e456

Merge remote-tracking branch 'helinwang/parallel_executor' into paral…

85b1176

…lel_executor

update doc

ae83f50

before pybind2.2

20d0f97

release python GIL at executor.run

9689df5

clean up

ef50216

Pass run

6da0d89

remove scope in API

0999119

helinwang reviewed Mar 12, 2018

View reviewed changes

update readme

a27e4d9

helinwang reviewed Mar 12, 2018

View reviewed changes

Yang Yang added 3 commits March 12, 2018 22:50

update readme

b766466

return val in all gpus

a220790

change example to use all visible gpus

18289b8

tonyyang-svail changed the title ~~[WIP] Parallel executor~~ [Speed] feature/ParallelExecutor Mar 12, 2018

panyx0718 reviewed Mar 13, 2018

View reviewed changes

tonyyang-svail mentioned this pull request Mar 13, 2018

[WIP] C++ implementation of parallel executor #9035

Closed

tonyyang-svail closed this Mar 13, 2018

chengduoZH added the parallel_exe parallel executor label Apr 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speed] feature/ParallelExecutor #8891

[Speed] feature/ParallelExecutor #8891

tonyyang-svail commented Mar 8, 2018 •

edited by helinwang

Loading

helinwang Mar 12, 2018

helinwang Mar 12, 2018

helinwang Mar 12, 2018

panyx0718 Mar 13, 2018 •

edited

Loading

tonyyang-svail Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

tonyyang-svail Mar 13, 2018

panyx0718 Mar 13, 2018

tonyyang-svail Mar 13, 2018

panyx0718 Mar 13, 2018 •

edited

Loading

panyx0718 Mar 13, 2018

tonyyang-svail Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

typhoonzero Mar 13, 2018

tonyyang-svail Mar 13, 2018

panyx0718 Mar 13, 2018 •

edited

Loading

tonyyang-svail Mar 13, 2018

tonyyang-svail commented Mar 13, 2018

[Speed] feature/ParallelExecutor #8891

[Speed] feature/ParallelExecutor #8891

Conversation

tonyyang-svail commented Mar 8, 2018 • edited by helinwang Loading

Save Model (to be implemented)

Proposed Solution

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panyx0718 Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panyx0718 Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panyx0718 Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonyyang-svail commented Mar 13, 2018

tonyyang-svail commented Mar 8, 2018 •

edited by helinwang

Loading

panyx0718 Mar 13, 2018 •

edited

Loading

panyx0718 Mar 13, 2018 •

edited

Loading

panyx0718 Mar 13, 2018 •

edited

Loading