question on backward computation #12

Young768 · 2020-06-09T00:29:38Z

Hi Hugectr experts,

I have a question on backward computation. Take the localized slot as example,
I notice that hugectr perform the all-to-all after the forward propagation. And in the backward, it performs the all-to-all again before the backward propagation. Why there is two all-to-all operations between the forward and backward?

Young768 · 2020-06-09T21:48:34Z

whats the difference between alltoall_forward and alltoall_backward?
@wl1136 could you plz help me with this question?

wl1136 · 2020-06-17T06:36:19Z

Hi @Young768 , alltoall_forward and alltoall_backward are not the same, since the send/recv table in these two operations are reversed. For example, if gpu[n] send N elements in forward, it needs to recv N elements in backward.

Young768 · 2020-06-17T06:54:12Z

@wl1136 Thanks for your answer. I think I got it.

One more question, are these N elements identical to each other across different GPUs? I emailed you this question several days before but didnt receive your answer. Just want to know the difference of dense layers' tensor between GPUs. Any luck to have your answer?

wl1136 · 2020-07-07T10:11:03Z

@Young768 Sorry for the late reply. For the question "are these N elements identical to each other across different GPUs", the answer is Not Always, it depends on the distribution of embedding table between GPUs. For example, if there are 8 GPUs with 26 slots and each slot has one feature, then gpu[0] and gpu[1] will have 4 slots while other gpus will have 3 slots. When doing all2all forward, gpu[0] and gpu[1] will send 4 slots of elements to other gpus.
BTW, it seems that i have not reveived your email.

wl1136 self-assigned this Jul 7, 2020

Young768 closed this as completed Jul 13, 2020

shijieliu added this to the v2.3 milestone Nov 13, 2020

dusir mentioned this issue Sep 7, 2023

[Question] An illegal memory access was encountered on H800 & Hugectr dcn test #417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on backward computation #12

question on backward computation #12

Young768 commented Jun 9, 2020

Young768 commented Jun 9, 2020 •

edited

Loading

wl1136 commented Jun 17, 2020

Young768 commented Jun 17, 2020

wl1136 commented Jul 7, 2020

question on backward computation #12

question on backward computation #12

Comments

Young768 commented Jun 9, 2020

Young768 commented Jun 9, 2020 • edited Loading

wl1136 commented Jun 17, 2020

Young768 commented Jun 17, 2020

wl1136 commented Jul 7, 2020

Young768 commented Jun 9, 2020 •

edited

Loading