-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question on backward computation #12
Comments
whats the difference between alltoall_forward and alltoall_backward? |
Hi @Young768 , alltoall_forward and alltoall_backward are not the same, since the send/recv table in these two operations are reversed. For example, if gpu[n] send N elements in forward, it needs to recv N elements in backward. |
@wl1136 Thanks for your answer. I think I got it. One more question, are these N elements identical to each other across different GPUs? I emailed you this question several days before but didnt receive your answer. Just want to know the difference of dense layers' tensor between GPUs. Any luck to have your answer? |
@Young768 Sorry for the late reply. For the question "are these N elements identical to each other across different GPUs", the answer is Not Always, it depends on the distribution of embedding table between GPUs. For example, if there are 8 GPUs with 26 slots and each slot has one feature, then gpu[0] and gpu[1] will have 4 slots while other gpus will have 3 slots. When doing all2all forward, gpu[0] and gpu[1] will send 4 slots of elements to other gpus. |
Hi Hugectr experts,
I have a question on backward computation. Take the localized slot as example,
I notice that hugectr perform the all-to-all after the forward propagation. And in the backward, it performs the all-to-all again before the backward propagation. Why there is two all-to-all operations between the forward and backward?
The text was updated successfully, but these errors were encountered: