[Feature] Enable multi gpu distributed training of fluid #9746

typhoonzero · 2018-04-08T10:37:54Z

Resolves #8139

Sample code to run multi GPU distributed training:

def train_loop_parallel(use_gpu, trainer_prog, trainer_id=0, bcast=False):
        place = core.CPUPlace() if not use_gpu else core.CUDAPlace(0)
        startup_exe = fluid.Executor(place)
        startup_exe.run(fluid.default_startup_program())
        exe = fluid.ParallelExecutor(use_gpu, avg_cost.name)

        feeder = fluid.DataFeeder(place=place, feed_list=[images, label])

        for pass_id in range(args.num_passes):
            for batch_id, data in enumerate(train_reader()):
                print("before run one...")
                loss, = exe.run(
                        [avg_cost.name],
                        feed_dict=feeder.feed(data))
                if bcast:
                    exe.bcast_params()
                print("Pass %d, batch %d, loss %s" % (pass_id, batch_id, np.array(loss)))

refine sync_with_cpp when remove ops or remove vars

… multigpumultinode

luotao1 and others added 11 commits April 4, 2018 11:21

Merge pull request PaddlePaddle#9600 from luotao1/sync_with_cpp

063868f

refine sync_with_cpp when remove ops or remove vars

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d2760bd

… multigpumultinode

first wip commit

01c6618

wip

baea2cf

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

22f03a1

… multigpumultinode

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b9c28df

… multigpumultinode

wip testing

0bf799a

have stream removed error

ce08dc8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ec69768

… multigpumultinode

finish

16a9dfe

fix ci

d1e63a1

typhoonzero changed the title ~~[WIP] [Feature] Enable multi gpu distributed training of fluid~~ [Feature] Enable multi gpu distributed training of fluid Apr 11, 2018

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

dfc6025

… multigpumultinode

typhoonzero requested review from reyoung and Yancey1989 April 11, 2018 09:24

reyoung approved these changes Apr 11, 2018

View reviewed changes

typhoonzero merged commit 652cf43 into PaddlePaddle:develop Apr 11, 2018

typhoonzero deleted the multigpumultinode branch April 11, 2018 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Enable multi gpu distributed training of fluid #9746

[Feature] Enable multi gpu distributed training of fluid #9746

typhoonzero commented Apr 8, 2018 •

edited

Loading

[Feature] Enable multi gpu distributed training of fluid #9746

[Feature] Enable multi gpu distributed training of fluid #9746

Conversation

typhoonzero commented Apr 8, 2018 • edited Loading

typhoonzero commented Apr 8, 2018 •

edited

Loading