-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fluid API using python with guard. #6508
Comments
For the trainer side, I think the order of send/recv is: with fluid.parallel_for(len(splited)) as iter:
layers.send(splited["grad"][iter.idx])
with fluid.parallel_for(len(splited)) as iter:
layers.recv(splited["param"][iter.idx]) We need to execute Recv after all variables are sent. And on the other hand, I saw #7706 also list the send/recv Op, shall we execute the Send/Recv Op in a goroutine? |
"If we are using CSP model, the server side may look like:", do you mean the worker side? I think
|
Btw, is the Python code for illustration only? I don't think we should expose send/recv OP to the user. |
Good point, thank you.
I think we should expose send/recv/listen_and_serv as layers to users, so that fluid can be a "real" programming language. For example, @dzhwinter talked yestoday that we may need a single server to merge all the trainers evaluation, this could be done by using these ops as layers. |
I see, thanks. Does the user really care to merge all the trainers evaluation (maybe |
People may need to define their own distributed network, like in #7671. And, yes, we'd like users to use transpilers for simplicity in most cases. If we have those layers, we can also use them in the transpiler to simplify the transpiler's implementation. |
listen_and_serv
,send
,recv
op implementation.listen_and_serv
According to https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/concurrent_programming.md#the-worker-program we need to implement similar API looks like:
Server side:
Worker side:
If we are using CSP model, the server side may look like:
The text was updated successfully, but these errors were encountered: