Implement fluid API using python with guard. #6508

typhoonzero · 2017-12-12T06:52:57Z

change current implement to listen_and_serv, send, recv op implementation.
implement python API with guard for listen_and_serv
build sample program using python with statement APIs.
update document about this.

According to https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/concurrent_programming.md#the-worker-program we need to implement similar API looks like:

Server side:

loss = define_model()
server = fluid.listen_and_serv()
with server.do():
    opt = fluid.optimizer.Adam()
    opt.minimize(loss)

Worker side:

loss = define_model()
params, grads = fluid.append_backward(loss)
splited = layers.split(params, grads)
with fluid.parallel_for(len(splited)) as iter:
    layers.send(splited["grad"][iter.idx])
with fluid.parallel_for(len(splited)) as iter:
    layers.recv(splited["param"][iter.idx])
layers.concat(splited["param"])

If we are using CSP model, the server side may look like:

loss = define_model()
params, grads = fluid.append_backward(loss)
param_ch = fluid.make_chan()
param_recved_ch = fluid.make_chan()
grad_ch = fluid.make_chan()
layers.split_to_chan(params, param_ch)
layers.split_to_chan(grads, grad_ch)

with fluid.go():
    layers.send(grad_ch)
with fluid.go():
    updated_param = layers.recv(param_ch)
    param_recved_ch.push(updated_param)
layers.concat(param_recved_ch)

The text was updated successfully, but these errors were encountered:

Yancey1989 · 2018-01-23T08:59:19Z

For the trainer side, I think the order of send/recv is:

with fluid.parallel_for(len(splited)) as iter:
    layers.send(splited["grad"][iter.idx])
with fluid.parallel_for(len(splited)) as iter:
    layers.recv(splited["param"][iter.idx])

We need to execute Recv after all variables are sent.

And on the other hand, I saw #7706 also list the send/recv Op, shall we execute the Send/Recv Op in a goroutine?

helinwang · 2018-01-24T00:15:17Z

"If we are using CSP model, the server side may look like:", do you mean the worker side?

I think layers.send(grad_ch) should be something like: layers.send(grad_ch.recv()), in this way the send is still sending a variable, not a channel.

layers.recv(param_ch) could be param_ch.send(layers.recv())

helinwang · 2018-01-24T00:16:07Z

Btw, is the Python code for illustration only? I don't think we should expose send/recv OP to the user.

typhoonzero · 2018-01-24T02:04:05Z

I think layers.send(grad_ch) should be something like: layers.send(grad_ch.recv()), in this way the send is still sending a variable, not a channel.

Good point, thank you.

Btw, is the Python code for illustration only? I don't think we should expose send/recv OP to the user.

I think we should expose send/recv/listen_and_serv as layers to users, so that fluid can be a "real" programming language.

For example, @dzhwinter talked yestoday that we may need a single server to merge all the trainers evaluation, this could be done by using these ops as layers.

helinwang · 2018-01-24T18:05:01Z

I see, thanks. Does the user really care to merge all the trainers evaluation (maybe trainer-id==0's local evaluation is suffice), and write the Python code manually? I am a little worried about the Python binding part becomes something that no one actually uses, but just my 2 cents.

typhoonzero · 2018-01-25T02:14:01Z

People may need to define their own distributed network, like in #7671. And, yes, we'd like users to use transpilers for simplicity in most cases. If we have those layers, we can also use them in the transpiler to simplify the transpiler's implementation.

typhoonzero mentioned this issue Dec 21, 2017

[Done] API for dist train #6297

Merged

gongweibao self-assigned this Jan 11, 2018

typhoonzero unassigned gongweibao Jan 23, 2018

typhoonzero changed the title ~~Use while op inside recv_op as the event loop.~~ Implement fluid API using python with guard. Jan 23, 2018

typhoonzero self-assigned this Jan 23, 2018

This was referenced Jan 23, 2018

Add csp.md #7706

Merged

Feature/Recv op python with guard #7793

Merged

typhoonzero mentioned this issue Jan 29, 2018

Rename rpc ops #7947

Merged

typhoonzero closed this as completed May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement fluid API using python with guard. #6508

Implement fluid API using python with guard. #6508

typhoonzero commented Dec 12, 2017 •

edited

Loading

Yancey1989 commented Jan 23, 2018 •

edited

Loading

helinwang commented Jan 24, 2018

helinwang commented Jan 24, 2018

typhoonzero commented Jan 24, 2018 •

edited

Loading

helinwang commented Jan 24, 2018 •

edited

Loading

typhoonzero commented Jan 25, 2018

Implement fluid API using python with guard. #6508

Implement fluid API using python with guard. #6508

Comments

typhoonzero commented Dec 12, 2017 • edited Loading

Yancey1989 commented Jan 23, 2018 • edited Loading

helinwang commented Jan 24, 2018

helinwang commented Jan 24, 2018

typhoonzero commented Jan 24, 2018 • edited Loading

helinwang commented Jan 24, 2018 • edited Loading

typhoonzero commented Jan 25, 2018

typhoonzero commented Dec 12, 2017 •

edited

Loading

Yancey1989 commented Jan 23, 2018 •

edited

Loading

typhoonzero commented Jan 24, 2018 •

edited

Loading

helinwang commented Jan 24, 2018 •

edited

Loading