Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on backward computation #12

Closed
Young768 opened this issue Jun 9, 2020 · 4 comments
Closed

question on backward computation #12

Young768 opened this issue Jun 9, 2020 · 4 comments
Assignees
Milestone

Comments

@Young768
Copy link

Young768 commented Jun 9, 2020

Hi Hugectr experts,

I have a question on backward computation. Take the localized slot as example,
I notice that hugectr perform the all-to-all after the forward propagation. And in the backward, it performs the all-to-all again before the backward propagation. Why there is two all-to-all operations between the forward and backward?

@Young768
Copy link
Author

Young768 commented Jun 9, 2020

whats the difference between alltoall_forward and alltoall_backward?
@wl1136 could you plz help me with this question?

@wl1136
Copy link
Contributor

wl1136 commented Jun 17, 2020

Hi @Young768 , alltoall_forward and alltoall_backward are not the same, since the send/recv table in these two operations are reversed. For example, if gpu[n] send N elements in forward, it needs to recv N elements in backward.

@Young768
Copy link
Author

@wl1136 Thanks for your answer. I think I got it.

One more question, are these N elements identical to each other across different GPUs? I emailed you this question several days before but didnt receive your answer. Just want to know the difference of dense layers' tensor between GPUs. Any luck to have your answer?

@wl1136
Copy link
Contributor

wl1136 commented Jul 7, 2020

@Young768 Sorry for the late reply. For the question "are these N elements identical to each other across different GPUs", the answer is Not Always, it depends on the distribution of embedding table between GPUs. For example, if there are 8 GPUs with 26 slots and each slot has one feature, then gpu[0] and gpu[1] will have 4 slots while other gpus will have 3 slots. When doing all2all forward, gpu[0] and gpu[1] will send 4 slots of elements to other gpus.
BTW, it seems that i have not reveived your email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants